Is it possible that speculative execution on intel CPU causing EXC_BAD_INSTRUCTION (SIGILL)

0

I have a hypothesis that speculative execution on Intel Nehalem (1 gen) causing a crash. Is it possible or I completely wrong? If this is possible what can I do to prevent this? Maybe disable speculative execution for just one function or whole translation unit?

For the compilation of the cpp file that has problematic code, clang is used with flags -mavx2 -mxsave all other files compiled without these flags. This code works fine on any available contemporary mac book and windows laptop/desktop.

Testers mac book has Intel(R) Core(TM) i5 CPU 760. This CPU doesn't support AVX2 instruction set. There is a code that checks if the AVX2 supported and if not it is not executed. I can't have direct access to this device for debugging to know which exactly code causing the crash. But now I have two hypotheses:

  • code that checks if AVX2 supported is wrong and returns true when should return false
  • even though check returns false speculative execution actually run the AVX2 code causing the crash

I have already replaced/"fixed" the checking code as the primary hypothesis but tester still reports the crash. So I don't know for sure that isAvx2Supported is false.

Code that checks if AVX2 supported

void cpuid(int info[4], int InfoType) noexcept
{
#ifdef _WIN32
  __cpuidex(info, InfoType, 0);
#else
  __cpuid_count(InfoType, 0, info[0], info[1], info[2], info[3]);
#endif
}

bool check_xcr0_ymm() noexcept
{
  uint32_t xcr0;
#if defined(_MSC_VER)
  xcr0 = (uint32_t)_xgetbv(0);
#else
  __asm__ __volatile__("xgetbv" : "=a" (xcr0) : "c" (0) : "%edx");
#endif
  // checking if xmm and ymm state are enabled in XCR0
  return (xcr0 & 6) == 6;
}

bool check_4th_gen_intel_core_features() noexcept
{
  // see original article
  // https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
  int cpuInfo[4] = {};

  cpuid(cpuInfo, 1);
  //    CPUID.(EAX=01H, ECX=0H):ECX.FMA[bit 12]==1
  // && CPUID.(EAX=01H, ECX=0H):ECX.MOVBE[bit 22]==1
  // && CPUID.(EAX=01H, ECX=0H):ECX.OSXSAVE[bit 27]==1
  constexpr uint32_t fma_movbe_osxsave_mask = ((1 << 12) | (1 << 22) | (1 << 27));
  if((cpuInfo[2] & fma_movbe_osxsave_mask) != fma_movbe_osxsave_mask)
    return false;

  if(!check_xcr0_ymm())
    return false;

  cpuid(cpuInfo, 7);
  //    CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]==1
  // && CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]==1
  // && CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]==1
  constexpr uint32_t avx2_bmi12_mask = (1 << 5) | (1 << 3) | (1 << 8);
  if((cpuInfo[1] & avx2_bmi12_mask) != avx2_bmi12_mask)
    return false;

  cpuid(cpuInfo, 0x80000001);
  // CPUID.(EAX=80000001H):ECX.LZCNT[bit 5]==1
  if((cpuInfo[2] & (1 << 5)) == 0)
    return false;

  return true;
}

const auto isAvx2Supported = check_4th_gen_intel_core_features();

actual code that uses AVX2

int findCharFast(const char* data, size_t dataSize, char c, unsigned int& offset)
{
  if(isAvx2Supported)
  {
    const auto mask = _mm256_set1_epi8(c);
    auto it = data + offset;
    for(const auto end = data + dataSize - 31; it < end; it += 32)
    {
      if(const auto result = _mm256_movemask_epi8(_mm256_cmpeq_epi8(mask, _mm256_loadu_si256(reinterpret_cast<const __m256i*>(it)))))
      {
        return it - data + get_first_bit_set(result);
      }
    }
    offset = it - data;
  }
  return -1;
}

crash report says

Crashed Thread: 43 Queue(0x60400005b270)[16]

Exception Type: EXC_BAD_INSTRUCTION (SIGILL) Exception Codes: 0x0000000000000001, 0x0000000000000000

Thread 43 Crashed:: Queue(0x60400005b270)[16] 0 0x000000010d54b06b findCharFast(char const*, unsigned long, char, unsigned int&) + 91

Thread 43 crashed with X86 Thread State (64-bit): rax: 0x00000000ffffffff rbx: 0x0000000000000000 rcx: 0x000070000982bbd0 rdx: 0x0000000000000000 rdi: 0x00007f9ac044fa0b rsi: 0x000000000000000d rbp: 0x000070000982bc00 rsp: 0x000070000982bbc8 r8: 0x0000000000000001 r9: 0x0000000000000001 r10: 0x000060c000334030 r11: 0xfffffffffffc0fde r12: 0x0000000000000001 r13: 0x0000600000016fb0 r14: 0x00007f9ac044fa0b r15: 0x00007f9ac044fa18 rip: 0x000000010d54b06b rfl: 0x0000000000010246 cr2: 0x000000010d529cf0

speculative-execution

1 Answer

0

There was a bug in AVX2 support detection code. Intel article which describes how to do it is basically wrong. Before calling xgetbv implementation MUST check that ECX.XSAVE[bit 26]==1. Checking only OSXSAVE flag is not sufficient.

answered on Stack Overflow Sep 6, 2019 by Anton Dyachenko

User contributions licensed under CC BY-SA 3.0