Should or shouldn't I mask the results of XGETBV before using them for XSETBV?

2

I am trying to execute some UEFI applications.

I found this code crashes on VirtualBox (test success is not printed while test start is printed):

#include <stdint.h>

void* ConOut;
uint64_t (*OutputString)(void* protocol, void* string);

void printChar(int c) {
    unsigned char data[4] = { (unsigned char)c };
    if (c == '\n') printChar('\r');
    OutputString(ConOut, data);
}

void printString(const char* str) {
    while (*str != '\0') printChar((unsigned char)*(str++));
}

void entry(void* unused, uint64_t* table) {
    (void)unused;

    ConOut = (void*)table[8];
    OutputString = (uint64_t (*)(void*, void*))((uint64_t*)ConOut)[1];

    printString("waiting for breakpoint set...\n");
    {
        volatile int j;
        for (j = 0; j < 1000000000; j++);
    }

    printString("test start\n");

    __asm__ __volatile__ (
        /* marker for setting breakpoint */
        "cmp $0xdeadbeef, %%eax\n\t"
        /* turn on OSXSAVE */
        "mov %%cr4, %%rax\n\t"
        "or $0x40000, %%rax\n\t"
        "mov %%rax, %%cr4\n\t"
        /* read XCR[0] */
        "xor %%eax, %%eax\n\t"
        "xor %%edx, %%edx\n\t"
        "xor %%ecx, %%ecx\n\t"
        "xgetbv\n\t"
        /* write XCR[0] */
        "xsetbv\n\t"
    : : : "%eax", "%ecx", "%edx");
    
    printString("test success\n");

    for (;;) __asm__ __volatile__ ("cli\n\thlt\n\t");
}

Compilation command:

C:\MyInstalledApps\TDM-GCC-64\bin\gcc -Wall -Wextra -nostdlib -e entry -m64 -Wl,--subsystem=10 minimum_test.c -o minimum_test.efi

From my examination, I found that EDX:EAX is set to 00000000:0000001f via the xgetbv instruction and xsetbv causes #GP (interrupt vector 13) fault seeing the value.

Strangely, when I execute the xgetbv instruction via stepping on VirtualBox, it sets EDX:EAX to 00000000:00000001 and therefore no fault happens and test success is printed.

Refering IntelĀ® 64 and IA-32 Architectures Software Developer Manuals, I found that it says this about XGETBV:

If fewer than 64 bits are implemented in the XCR being read, the values returned to EDX:EAX in unimplemented bit loca- tions are undefined.

Then, about XSETBV:

Protected Mode Exceptions
#GP(0)
If the current privilege level is not 0.
If an invalid XCR is specified in ECX.
If the value in EDX:EAX sets bits that are reserved in the XCR specified by ECX.
If an attempt is made to clear bit 0 of XCR0.
If an attempt is made to set XCR0[2:1] to 10b.

This case is setting reserved bits according to the EDX:EAX value. As the values of unimplemented bits returned from XGETBV are undefined, it looks like reasonable to mask the results of XGETBV before passing them to XSETBV. The value to use for masking can be obtained by CPUID with EAX=0x0D, ECX=0. After adding some code to apply masking, XSETBV worked well on VirtualBox.

On the other hand, the Intel manual also say this about XSETBV:

Undefined or reserved bits in an XCR should be set to values previously read.

This looks like the reserved bits should be set to values obtained via XGETBV and I shouldn't apply maskimg to force the bits to become zero.

As a conclusion, should or shouldn't I mask the result of XGETBV by the valid bits obtained via CPUID before passing them to XSETBV?


What I found as related but not duplicate question:


Host environment:

  • Windows 10 Home (x64)
  • Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59GHz
  • RAM 16.0 GB
  • VirtualBox 6.1.18 r142142 (Qt5.6.2)

Guest (VM) environment:

  • Operating System: Other/Unknown (64-bit)
  • Base Memory: 128 MB
  • Chipset: PIIX3
  • Enable I/O APIC
  • EFI: Enabled
  • 1 CPU
  • Acceleration: VT-x/AMD-V, Nested Paging, PAE/NX
  • Paravirtualization Interface: Default

Full code for testing:

#include <stdint.h>

void* ConOut;
uint64_t (*OutputString)(void* protocol, void* string);

void printChar(int c) {
    unsigned char data[4] = { (unsigned char)c };
    if (c == '\n') printChar('\r');
    OutputString(ConOut, data);
}

void printString(const char* str) {
    while (*str != '\0') printChar((unsigned char)*(str++));
}

void printInt(uint64_t value, int radix, int minDigits) {
    char vStr[128] = "";
    char* pStr = vStr + 120;
    int digits = 0;
    do {
        *(pStr--) = "0123456789ABCDEF"[value % radix];
        value /= radix;
        digits++;
    } while (value > 0 || digits < minDigits);
    printString(pStr + 1);
}

void stop(void) {
    __asm__ __volatile__(
        "cli\n\t"
        "1:\n\t"
        "hlt\n\t"
        "jmp 1b\n\t"
    );
}

void entry(void* unused, uint64_t* table) {
    uint32_t eax, ebx, ecx, edx, cs, cr0, xcr0_low, xcr0_high;
    uint32_t cpuid_max, eax_mask, edx_mask;
    unsigned char src_test[32], dst_test[32] = {0};
    int i;
    (void)unused;

    ConOut = (void*)table[8];
    OutputString = (uint64_t (*)(void*, void*))((uint64_t*)ConOut)[1];

    __asm__ __volatile__ (
        "xor %%eax, %%eax\n\t"
        "cpuid\n\t"
    : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx));
    printString("CPUID.00H: EAX=0x"); printInt(eax, 16, 8);
    printString(", EBX=0x"); printInt(ebx, 16, 8);
    printString(", ECX=0x"); printInt(ecx, 16, 8);
    printString(", EDX=0x"); printInt(edx, 16, 8);
    printChar('\n');
    if (eax < 1) {
        printString("CPUID.01H not supported!\n");
        stop();
    }
    cpuid_max = eax;

    __asm__ __volatile__ (
        "mov $1, %%eax\n\t"
        "cpuid\n\t"
    : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx));
    printString("CPUID.01H: EAX=0x"); printInt(eax, 16, 8);
    printString(", EBX=0x"); printInt(ebx, 16, 8);
    printString(", ECX=0x"); printInt(ecx, 16, 8);
    printString(", EDX=0x"); printInt(edx, 16, 8);
    printChar('\n');
    if (!((ecx >> 26) & 1)) {
        printString("xsave (ECX[26]) not supported!\n");
        stop();
    }

    if (cpuid_max >= 0x0D) {
        __asm__ __volatile__ (
            "mov $0xd, %%eax\n\t"
            "xor %%ecx, %%ecx\n\t"
            "cpuid\n\t"
        : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx));
        printString("CPUID.0DH: EAX=0x"); printInt(eax, 16, 8);
        printString(", EBX=0x"); printInt(ebx, 16, 8);
        printString(", ECX=0x"); printInt(ecx, 16, 8);
        printString(", EDX=0x"); printInt(edx, 16, 8);
        printChar('\n');
        eax_mask = eax;
        edx_mask = edx;
    } else {
        printString("CPUID.0DH not supported\n");
        eax_mask = UINT32_C(0xffffffff);
        edx_mask = UINT32_C(0xffffffff);
    }

    __asm__ __volatile__ (
        "mov %%cs, %%ax\n\t"
        "movzwl %%ax, %0\n\t"
        "mov %%cr0, %%rax\n\t"
    : "=g"(cs), "=a"(cr0));
    printString("CPL check: CS=0x"); printInt(cs, 16, 4);
    printString(", CR0=0x"); printInt(cr0, 16, 8);
    printChar('\n');
    if (!cr0 & 1) {
        printString("not in protected mode!\n");
        stop();
    }
    if ((cs & 3) != 0) {
        printString("CPL is not zero!\n");
        stop();
    }

    printString("waiting for breakpoint set...\n");
    {
        volatile int j;
        for (j = 0; j < 1000000000; j++);
    }

    printString("turning on OSXSAVE\n");
    __asm__ __volatile__ (
        /* turn on OSXSAVE */
        "mov %%cr4, %%rax\n\t"
        "or $0x40000, %%rax\n\t"
        "mov %%rax, %%cr4\n\t"
    : : : "%eax");

    __asm__ __volatile__ (
        /* marker for setting breakpoint */
        "cmp $0xdeadbeef, %%eax\n\t"
        /* read XCR[0] */
        "xor %%eax, %%eax\n\t"
        "xor %%edx, %%edx\n\t"
        "xor %%ecx, %%ecx\n\t"
        "xgetbv\n\t"
    : "=a"(xcr0_low), "=d"(xcr0_high) : : "%ecx", "cc");
    printString("XCR[0] = ");
    printInt(xcr0_high, 16, 8); printChar(':');
    printInt(xcr0_low, 16, 8); printChar('\n');

    xcr0_low |= 6;

#if 0
    printString("applying mask\n");
    xcr0_low &= eax_mask;
    xcr0_high &= edx_mask;
#else
    (void)eax_mask; (void)edx_mask;
#endif

    printString("new XCR[0] will be: ");
    printInt(xcr0_high, 16, 8); printChar(':');
    printInt(xcr0_low, 16, 8); printChar('\n');

    printString("turning on AVX\n");
    __asm__ __volatile__ (
        /* marker for setting breakpoint */
        "cmp $0xdeadbeef, %%ecx\n\t"
        /* turn on AVX */
        "xor %%ecx, %%ecx\n\t"
        "xsetbv\n\t"
    : : "a"(xcr0_low), "d"(xcr0_high) : "%ecx", "cc");

    for (i = 0; i < 32; i++) src_test[i] = 123 * (i + 1);
    printString("testing AVX instruction\n");
    printString("src:\n");
    for (i = 0; i < 32; i++) {
        printInt(src_test[i], 16, 2);
        printChar((i + 1) % 16 == 0 ? '\n' : ' ');
    }
    printString("dest before:\n");
    for (i = 0; i < 32; i++) {
        printInt(dst_test[i], 16, 2);
        printChar((i + 1) % 16 == 0 ? '\n' : ' ');
    }
    __asm__ __volatile__ (
        "vmovups (%0), %%ymm0\n\t"
        "vmovups %%ymm0, (%1)\n\t"
    : : "r"(src_test), "r"(dst_test));
    printString("dest after:\n");
    for (i = 0; i < 32; i++) {
        printInt(dst_test[i], 16, 2);
        printChar((i + 1) % 16 == 0 ? '\n' : ' ');
    }

    printString("test done.\n");
    stop();
}

Output from VirtualBox:

CPUID.00H: EAX=0x00000016, EBX=0x756E6547, ECX=0x6C65746E, EDX=0x49656E69
CPUID.01H: EAX=0x000906ED, EBX=0x00010800, ECX=0x56DA220B, EDX=0x178BFBFF
CPUID.0DH: EAX=0x00000007, EBX=0x00000340, EDX=0x00000340, EDX=0x00000000
CPL check: CS=0x0038, CR0=0xC0010033
waiting for breakpoint set...
turning on OSXSAVE
XCR[0] = 00000000:0000001F
new XCR[0] will be: 00000000:0000001F
turning on AVX

Output when I execute the program directly on my PC:

CPUID.00H: EAX=0x00000016, EBX=0x756E6547, ECX=0x6C65746E, EDX=0x49656E69
CPUID.01H: EAX=0x000906ED, EBX=0x00100800, ECX=0x77FAFBBF, EDX=0xBFEBFBFF
CPUID.0DH: EAX=0x0000001F, EBX=0x00000240, ECX=0x00000440, EDX=0x00000000
CPL check: CS=0x0038, CR0=0x80000013
waiting for breakpoint set...
turning on OSXSAVE
XCR[0] = 00000000:00000001
new XCR[0] will be: 00000000:00000007
turning on AVX
testing AVX instruction
src:
7B F6 71 EC 67 E2 5D D8 53 CE 49 C4 3F BA 35 B0
2B A6 21 9C 17 92 0D 88 03 7E F9 74 EF 6A E5 60
dest before:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dest after:
7B F6 71 EC 67 E2 5D D8 53 CE 49 C4 3F BA 35 B0
2B A6 21 9C 17 92 0D 88 03 7E F9 74 EF 6A E5 60
test done.
x86
x86-64
avx
bare-metal
asked on Stack Overflow Apr 9, 2021 by MikeCAT

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0