Invalid guest state during VM entry

1

I'm learning about Intel's VT-x virtualization support and am trying to (eventually) write a linux kernel module to virtualize the already-running system. As a first step, I'm enabling the rdtsc-exiting CPU-based execution control, setting the guest state rip to point to a rdtsc instruction, and setting the host state rip to point to an VM-exit handler that reads the exit reason from the VMCS, notifies the user of it, exits VMX mode, and cleans up all allocated memory. Unfortunately, after initialization of the VMCS and execution of vmlaunch, my exit handler is called correctly, but reads the exit reason field of the VMCS as 0x80000021 (and the exit qualification as 0). As I understand it, bit 31 indicates that there was an error during VM-entry, and the basic exit reason – 0x21 – indicates that something was initialized incorrectly in the VMCS guest state.

However, I have checked through every single line of section 26.3 in Volume 3C of the intel software developers' manual, "Checking and Loading Guest State", and cannot find a single violation in my initialization. Here is a dump of the items I initialize in the VMCS region:

[*]  initializing vmcs control fields
[**] ia32_vmx_basic:                0xda040000000004
[**] using TRUE ctl msrs
[**] pinbased controls:             0x00000016
[**] primary cpu based controls:    0x04007172
[**] secondary cpu based controls:  0x00000000
[**] vm exit controls:              0x00036ffb
[**] vm entry controls:             0x000013fb
[*]  initialization complete

[*]  initializing vmcs registers
[**] cr0:       0x80050033
[**] cr3:       0x453710003
[**] cr4:       0x3626e0
[**] dr7:       0x400
[**] guest rip: 0xffffffffc070e03a
[**] host rip:  0xffffffffc070e000
[**] guest rsp: 0xffff98665b0c6000
[**] host rsp:  0xffff98665b0c6000
[**] rflags:    0x246
[**] idtr:      0xfffffe0000000000
[**]    lim:    0xfff
[**] gdtr:      0xfffffe0000100000
[**]    lim:    0x7f
[**] tr:        0x0040
[**]    rights: 0x8b
[**]    lim:    0x206f
[**]    base:   0xfffffe0000102000
[**] ldtr:      0x0000
[**]    rights: 0x10000
[**]    lim:    0x0
[**]    base:   0x0
[**] cs:        0x10
[**]    rights: 0xa09b
[**]    lim:    0xffffffff
[**]    base:   0x0
[**] ss:        0x18
[**]    rights: 0xc093
[**]    lim:    0xffffffff
[**]    base:   0x0
[**] ds:        0x00
[**]    rights: 0x10000
[**]    lim:    0x0
[**]    base:   0x0
[**] es:        0x00
[**]    rights: 0x10000
[**]    lim:    0x0
[**]    base:   0x0
[**] fs:        0x00
[**]    rights: 0x10000
[**]    lim:    0x0
[**]    base:   0x7fba89cb3540
[**] gs:        0x00
[**]    rights: 0x10000
[**]    lim:    0x0
[**]    base:   0xffff98665fb40000
[**] vmcs link: 0xffffffffffffffff
[**] msrs:
[**]    dbgctl: 0x0
[**]    pat:    0x407050600070106
[**]    efer:   0xd01
[**] sysenter msrs:
[**]    cs:     0xd01
[**]    esp:    0xfffffe0000101200
[**]    eip:    0xffffffffa6601720
[*]  initialization complete

Any VMCS field not listed above is implicitly initialized to zero, as I zero out the page that I allocate for the VMCS region before carrying out any other initialization steps. This is the relevant section of code:

//assumes vmcs already current
int initialize_vmcs(unsigned long guest_rip, unsigned long host_rip, unsigned long guest_rsp, unsigned long host_rsp) {
    printk("[*]  initializing vmcs control fields\n");
    
    ///////////////////////////
    
    msr_t msr;
    lhf_t lhf;  //lower half of rflags
    unsigned long error_code;
    
    pin_based_execution_controls_t pin_x_ctls;
    primary_cpu_based_execution_controls_t pri_cpu_x_ctls;
    secondary_cpu_based_execution_controls_t sec_cpu_x_ctls;
    vm_exit_controls_t exit_ctls;
    vm_entry_controls_t entry_ctls;
    
    ///////////////////////////
    
    pin_x_ctls.val=0;
    
    pri_cpu_x_ctls.val=0;
    pri_cpu_x_ctls.rdtsc_exiting=1;
    
    sec_cpu_x_ctls.val=0;
    
    exit_ctls.val=0;
    exit_ctls.host_addr_space_size=1;
    
    entry_ctls.val=0;
    entry_ctls.ia_32e_mode_guest=1;
    
    //////////////////////////

    READ_MSR(msr, IA32_VMX_BASIC);
    printk("[**] ia32_vmx_basic:\t\t\t0x%lx\n", msr.val);
    int true_flag=msr.vmx_basic.vmx_controls_clear;
    printk("[**] %susing TRUE ctl msrs\n", true_flag ? "":"not ");
    
    READ_MSR(msr, true_flag ? IA32_VMX_TRUE_PINBASED_CTLS:IA32_VMX_PINBASED_CTLS);
    pin_x_ctls.val|=msr.vmx_ctls.allowed_zeroes;
    printk("[**] pinbased controls:\t\t\t0x%08x\n", pin_x_ctls.val);
    if( (pin_x_ctls.val & msr.vmx_ctls.allowed_ones)!=pin_x_ctls.val ) {
        printk("[*]  unsupported bit set\n\n");
        return -EINVAL; }
    EC_VMWRITE(pin_x_ctls.val, PIN_BASED_X_CTLS, lhf, error_code);
    
    READ_MSR(msr, true_flag ? IA32_VMX_TRUE_PROCBASED_CTLS:IA32_VMX_PROCBASED_CTLS);
    pri_cpu_x_ctls.val|=msr.vmx_ctls.allowed_zeroes;
    printk("[**] primary cpu based controls:\t0x%08x\n", pri_cpu_x_ctls.val);
    if( (pri_cpu_x_ctls.val & msr.vmx_ctls.allowed_ones)!=pri_cpu_x_ctls.val ) {
        printk("[*]  unsupported bit set\n\n");
        return -EINVAL; }
    EC_VMWRITE(pri_cpu_x_ctls.val, PRIMARY_CPU_BASED_X_CTLS, lhf, error_code);
    
    READ_MSR(msr, IA32_VMX_PROCBASED_CTLS2);
    sec_cpu_x_ctls.val|=msr.vmx_ctls.allowed_zeroes;    //uneccessary
    printk("[**] secondary cpu based controls:\t0x%08x\n", sec_cpu_x_ctls.val);
    if( (sec_cpu_x_ctls.val & msr.vmx_ctls.allowed_ones)!=sec_cpu_x_ctls.val ) {
        printk("[*]  unsupported bit set\n\n");
        return -EINVAL; }
    EC_VMWRITE(sec_cpu_x_ctls.val, SECONDARY_CPU_BASED_X_CTLS, lhf, error_code);
    
    READ_MSR(msr, true_flag ? IA32_VMX_TRUE_EXIT_CTLS:IA32_VMX_EXIT_CTLS);
    exit_ctls.val|=msr.vmx_ctls.allowed_zeroes;
    printk("[**] vm exit controls:\t\t\t0x%08x\n", exit_ctls.val);
    if( (exit_ctls.val & msr.vmx_ctls.allowed_ones)!=exit_ctls.val ) {
        printk("[*]  unsupported bit set\n\n");
        return -EINVAL; }
    EC_VMWRITE(exit_ctls.val, EXIT_CTLS, lhf, error_code);
    
    READ_MSR(msr, true_flag ? IA32_VMX_TRUE_ENTRY_CTLS:IA32_VMX_ENTRY_CTLS);
    entry_ctls.val|=msr.vmx_ctls.allowed_zeroes;
    printk("[**] vm entry controls:\t\t\t0x%08x\n", entry_ctls.val);
    if( (entry_ctls.val & msr.vmx_ctls.allowed_ones)!=entry_ctls.val ) {
        printk("[*]  unsupported bit set\n\n");
        return -EINVAL; }
    EC_VMWRITE(entry_ctls.val, ENTRY_CTLS, lhf, error_code);

    
    printk("[*]  initialization complete\n\n");
    
    //////////////////////////
    
    printk("[*]  initializing vmcs registers\n");
    unsigned long reg;
    
    __asm__ __volatile__("mov %%cr0, %0":"=r"(reg)::"memory");
    printk("[**] cr0:\t0x%lx\n", reg);
    READ_MSR(msr, IA32_VMX_CR0_FIXED0);
    if( (reg | msr.val)!=reg ) {
        printk("[*]  unsupported bit clear\n");
        return -EINVAL; }
    READ_MSR(msr, IA32_VMX_CR0_FIXED1);
    if( (reg & msr.val)!=reg ) {
        printk("[*]  unsupported bit set\n");
        return -EINVAL; }
    EC_VMWRITE(reg, GUEST_CR0, lhf, error_code);
    EC_VMWRITE(reg, HOST_CR0, lhf, error_code);
    
    __asm__ __volatile__("mov %%cr3, %0":"=r"(reg)::"memory");
    printk("[**] cr3:\t0x%lx\n", reg);
    EC_VMWRITE(reg, GUEST_CR3, lhf, error_code);
    EC_VMWRITE(reg, HOST_CR3, lhf, error_code);
    
    __asm__ __volatile__("mov %%cr4, %0":"=r"(reg)::"memory");
    printk("[**] cr4:\t0x%lx\n", reg);
    READ_MSR(msr, IA32_VMX_CR4_FIXED0);
    if( (reg | msr.val)!=reg ) {
        printk("[*]  unsupported bit clear\n");
        return -EINVAL; }
    READ_MSR(msr, IA32_VMX_CR4_FIXED1);
    if( (reg & msr.val)!=reg ) {
        printk("[*]  unsupported bit set\n");
        return -EINVAL; }
    EC_VMWRITE(reg, GUEST_CR4, lhf, error_code);
    EC_VMWRITE(reg, HOST_CR4, lhf, error_code);
    
    __asm__ __volatile__("mov %%dr7, %0":"=r"(reg)::"memory");
    printk("[**] dr7:\t0x%lx\n", reg);
    EC_VMWRITE(reg, GUEST_DR7, lhf, error_code);
    
    printk("[**] guest rip:\t0x%lx\n", guest_rip);
    EC_VMWRITE(guest_rip, GUEST_RIP, lhf, error_code);
    printk("[**] host rip:\t0x%lx\n", host_rip);
    EC_VMWRITE(host_rip, HOST_RIP, lhf, error_code);
    printk("[**] guest rsp:\t0x%lx\n", guest_rsp);
    EC_VMWRITE(guest_rsp, GUEST_RSP, lhf, error_code);
    printk("[**] host rsp:\t0x%lx\n", host_rsp);
    EC_VMWRITE(host_rsp, HOST_RSP, lhf, error_code);
    
    
    __asm__ __volatile__("pushf; pop %0":"=r"(reg)::"memory");
    printk("[**] rflags:\t0x%lx\n", reg);
    EC_VMWRITE(reg, GUEST_RFLAGS, lhf, error_code);
    
    
    dtr_t dtr;
    
    __asm__ __volatile__("sidt %0"::"m"(dtr):"memory");
    printk("[**] idtr:\t0x%016lx\n", dtr.base);
    printk("[**]\tlim:\t0x%x\n", dtr.lim_val);
    EC_VMWRITE(dtr.lim_val, GUEST_IDTR_LIMIT, lhf, error_code);
    EC_VMWRITE(dtr.base, GUEST_IDTR_BASE, lhf, error_code);
    EC_VMWRITE(dtr.base, HOST_IDTR_BASE, lhf, error_code);
    
    __asm__ __volatile__("sgdt %0"::"m"(dtr):"memory");
    printk("[**] gdtr:\t0x%016lx\n", dtr.base);
    printk("[**]\tlim:\t0x%x\n", dtr.lim_val);
    EC_VMWRITE(dtr.lim_val, GUEST_GDTR_LIMIT, lhf, error_code);
    EC_VMWRITE(dtr.base, GUEST_GDTR_BASE, lhf, error_code);
    EC_VMWRITE(dtr.base, HOST_GDTR_BASE, lhf, error_code);
    
    unsigned long base;
    unsigned int lim;
    access_rights_t access_rights;
    
    unsigned short tr=0;
    __asm__ __volatile__("str %0"::"m"(tr):"memory");
    printk("[**] tr:\t0x%04x\n", tr);
    EC_VMWRITE(tr, GUEST_TR_SELECTOR, lhf, error_code);
    EC_VMWRITE(tr, HOST_TR_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, tr, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_TR_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, tr, dtr.base);
    EC_VMWRITE(lim, GUEST_TR_LIMIT, lhf, error_code);
    base=0
        | ((long)(((tssd_t *)(dtr.base+tr))->base_addr_0_15))
        | ((long)(((tssd_t *)(dtr.base+tr))->base_addr_16_23)<<16)
        | ((long)(((tssd_t *)(dtr.base+tr))->base_addr_24_31)<<24)
        | ((long)(((tssd_t *)(dtr.base+tr))->base_addr_32_63)<<32);
    printk("[**]\tbase:\t0x%lx\n", base);
    EC_VMWRITE(base, GUEST_TR_BASE, lhf, error_code);
    EC_VMWRITE(base, HOST_TR_BASE, lhf, error_code);

    
    __asm__ __volatile__("sldt %0"::"m"(tr):"memory");
    printk("[**] ldtr:\t0x%04x\n", tr);
    EC_VMWRITE(tr, GUEST_LDTR_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, tr, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_LDTR_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, tr, dtr.base);
    EC_VMWRITE(lim, GUEST_LDTR_LIMIT, lhf, error_code);
    GET_BASE(base, tr, dtr.base);
    EC_VMWRITE(base, GUEST_LDTR_BASE, lhf, error_code);
    
    __asm__ __volatile__("mov %%cs, %0":"=r"(reg)::"memory");
    printk("[**] cs:\t0x%02lx\n", reg);
    EC_VMWRITE(reg, GUEST_CS_SELECTOR, lhf, error_code);
    EC_VMWRITE(reg, HOST_CS_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, reg, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_CS_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, reg, dtr.base);
    EC_VMWRITE(lim, GUEST_CS_LIMIT, lhf, error_code);
    GET_BASE(base, reg, dtr.base);
    EC_VMWRITE(base, GUEST_CS_BASE, lhf, error_code);
    
    __asm__ __volatile__("mov %%ss, %0":"=r"(reg)::"memory");
    printk("[**] ss:\t0x%02lx\n", reg);
    EC_VMWRITE(reg, GUEST_SS_SELECTOR, lhf, error_code);
    EC_VMWRITE(reg, HOST_SS_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, reg, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_SS_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, reg, dtr.base);
    EC_VMWRITE(lim, GUEST_SS_LIMIT, lhf, error_code);
    GET_BASE(base, reg, dtr.base);
    EC_VMWRITE(base, GUEST_SS_BASE, lhf, error_code);
    
    __asm__ __volatile__("mov %%ds, %0":"=r"(reg)::"memory");
    printk("[**] ds:\t0x%02lx\n", reg);
    EC_VMWRITE(reg, GUEST_DS_SELECTOR, lhf, error_code);
    EC_VMWRITE(reg, HOST_DS_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, reg, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_DS_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, reg, dtr.base);
    EC_VMWRITE(lim, GUEST_DS_LIMIT, lhf, error_code);
    GET_BASE(base, reg, dtr.base);
    EC_VMWRITE(base, GUEST_DS_BASE, lhf, error_code);
    
    __asm__ __volatile__("mov %%es, %0":"=r"(reg)::"memory");
    printk("[**] es:\t0x%02lx\n", reg);
    EC_VMWRITE(reg, GUEST_ES_SELECTOR, lhf, error_code);
    EC_VMWRITE(reg, HOST_ES_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, reg, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_ES_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, reg, dtr.base);
    EC_VMWRITE(lim, GUEST_ES_LIMIT, lhf, error_code);
    GET_BASE(base, reg, dtr.base);
    EC_VMWRITE(base, GUEST_ES_BASE, lhf, error_code);
    
    __asm__ __volatile__("mov %%fs, %0":"=r"(reg)::"memory");
    printk("[**] fs:\t0x%02lx\n", reg);
    EC_VMWRITE(reg, GUEST_FS_SELECTOR, lhf, error_code);
    EC_VMWRITE(reg, HOST_FS_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, reg, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_FS_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, reg, dtr.base);
    EC_VMWRITE(lim, GUEST_FS_LIMIT, lhf, error_code);
    READ_MSR(msr, IA32_FS_BASE);
    base=msr.val;
    printk("[**]\tbase:\t0x%lx\n", base);
    EC_VMWRITE(base, GUEST_FS_BASE, lhf, error_code);
    EC_VMWRITE(base, HOST_FS_BASE, lhf, error_code);
    
    __asm__ __volatile__("mov %%gs, %0":"=r"(reg)::"memory");
    printk("[**] gs:\t0x%02lx\n", reg);
    EC_VMWRITE(reg, GUEST_GS_SELECTOR, lhf, error_code);
    EC_VMWRITE(reg, HOST_GS_SELECTOR, lhf, error_code);
    GET_ACCESS_RIGHTS(access_rights, reg, dtr.base);
    EC_VMWRITE(access_rights.val, GUEST_GS_ACCESS_RIGHTS, lhf, error_code);
    GET_LIM_VAL(lim, reg, dtr.base);
    EC_VMWRITE(lim, GUEST_GS_LIMIT, lhf, error_code);
    READ_MSR(msr, IA32_GS_BASE);
    base=msr.val;
    printk("[**]\tbase:\t0x%lx\n", base);
    EC_VMWRITE(base, GUEST_GS_BASE, lhf, error_code);
    EC_VMWRITE(base, HOST_GS_BASE, lhf, error_code);
    
    printk("[**] vmcs link:\t0x%lx", 0xffffffffffffffff);
    EC_VMWRITE(0xffffffffffffffff, VMCS_LINK_PTR_F, lhf, error_code);

    printk("[**] msrs:\n");
    READ_MSR(msr, IA32_DEBUGCTL);
    printk("[**]\tdbgctl:\t0x%lx\n", msr.val);
    EC_VMWRITE(msr.val, GUEST_IA32_DEBUGCTL_F, lhf, error_code);
    READ_MSR(msr, IA32_SYSENTER_CS);
    READ_MSR(msr, IA32_PAT);
    printk("[**]\tpat:\t0x%lx\n", msr.val);
    EC_VMWRITE(msr.val, GUEST_IA32_PAT_F, lhf, error_code);
    EC_VMWRITE(msr.val, HOST_IA32_PAT_F, lhf, error_code);
    READ_MSR(msr, IA32_EFER);
    printk("[**]\tefer:\t0x%lx\n", msr.val);
    EC_VMWRITE(msr.val, GUEST_IA32_EFER_F, lhf, error_code);
    EC_VMWRITE(msr.val, HOST_IA32_EFER_F, lhf, error_code);
    printk("[**] sysenter msrs:\n");
    printk("[**]\tcs:\t0x%lx\n", msr.val);
    EC_VMWRITE(msr.val, GUEST_IA32_SYSENTER_CS, lhf, error_code);
    EC_VMWRITE(msr.val, HOST_IA32_SYSENTER_CS, lhf, error_code);
    READ_MSR(msr, IA32_SYSENTER_ESP);
    printk("[**]\tesp:\t0x%lx\n", msr.val);
    EC_VMWRITE(msr.val, GUEST_IA32_SYSENTER_ESP, lhf, error_code);
    EC_VMWRITE(msr.val, HOST_IA32_SYSENTER_ESP, lhf, error_code);
    READ_MSR(msr, IA32_SYSENTER_EIP);
    printk("[**]\teip:\t0x%lx\n", msr.val);
    EC_VMWRITE(msr.val, GUEST_IA32_SYSENTER_EIP, lhf, error_code);
    EC_VMWRITE(msr.val, HOST_IA32_SYSENTER_EIP, lhf, error_code);
    
    printk("[*]  initialization complete\n\n");
    return 0; }

with a few relevant macros:

//lhf for lower half of rflags
//faster/better practice than
//pushf/popf when possible
#define EC_VMWRITE(src, code, lhf, error_code)                                      \
    __asm__ __volatile__(                                                           \
        "vmwrite %1, %2;"                                                           \
        "lahf;"                                                                     \
        "shr $8, %%rax;"                                                            \
        "movb %%al, %0;"                                                            \
        :"=r"(lhf.val)                                                              \
        :"r"((long)(src)),                                                          \
         "r"((long)(code))                                                          \
        :"rax", "memory");                                                          \
    if(!VMsucceed(lhf)) {                                                           \
        if(VMfailValid(lhf)) {                                                      \
            VMREAD(error_code, VM_INSTRUCTION_ERROR, lhf);                          \
            printk("[*]  vmwrite failed with error code %ld\n\n", error_code); }    \
        else if(VMfailInvalid(lhf)) {                                               \
            printk("[*]  vmwrite failed with invalid region\n\n"); }       \
        return -EINVAL; }

#define GET_ACCESS_RIGHTS(access_rights, selector, gdt_base)                                \
if(!selector) {                                                                             \
    access_rights.val=0;                                                                    \
    access_rights.unusable=1; }                                                             \
else {                                                                                      \
    __asm__ __volatile__("lar %%ax, %%eax":"=a"(access_rights.val):"a"(selector):"memory"); \
    access_rights.val>>=8;                                                                  \
    access_rights.rsv_8_11=0;                                                               \
    access_rights.rsv_17_31=0; }                                                            \
printk("[**]\trights:\t0x%x\n", access_rights.val)
    
#define GET_LIM_VAL(lim, selector, gdt_base)                                    \
if(!selector) {                                                                 \
    lim=0; }                                                                    \
else {                                                                          \
    __asm__ __volatile__("lsl %%ax, %%rax":"=a"(lim):"a"(selector):"memory"); } \
printk("[**]\tlim:\t0x%x\n", lim)
    
#define GET_BASE(base, selector, gdt_base)                          \
if(!selector) {                                                     \
    base=0; }                                                       \
else {                                                              \
    base=0                                                          \
        | (*(unsigned short *)(gdt_base+selector+2))                \
        | ((*(unsigned int *)(gdt_base+selector+4))&0xff)<<16       \
        | ((*(unsigned int *)(gdt_base+selector+4))&0xff000000); }  \
printk("[**]\tbase:\t0x%lx\n", base)

a few relevant structs:

typedef union __attribute__((packed)) {
    struct __attribute__((packed)) {
        unsigned int segment_type:4;
        unsigned int s:1;   //descriptor type. 0=system, 1=code_or_data
        unsigned int dpl:2;
        unsigned int p:1;
        unsigned int rsv_8_11:4;
        unsigned int avl:1;
        unsigned int l:1;   //64 bit mode active (for only CS)
        unsigned int db:1;  //default operation size: 0=16 bit segment, 1=32 bit segment
        unsigned int g:1;   //granularity
        unsigned int unusable:1;
        unsigned int rsv_17_31:15; };
    unsigned int val;
} access_rights_t;

typedef union __attribute__((packed)) {
    struct __attribute__((packed)) {
        unsigned int external_interrupt_exiting:1;
        unsigned int rsv_1_2:2;
        unsigned int nmi_exiting:1;
        unsigned int rsv_4:1;
        unsigned int virtual_nmis:1;
        unsigned int preemption_timer_active:1;
        unsigned int process_posted_interrupts:1;
        unsigned int rsv_8_31:24; };
    unsigned int val;
} pin_based_execution_controls_t;

typedef union __attribute__((packed)) {
    struct __attribute__((packed)) {
        unsigned int rsv_0_1:2;
        unsigned int interrupt_window_exiting:1;
        unsigned int use_tsc_offsetting:1;
        unsigned int rsv_4_6:3;
        unsigned int hlt_exiting:1;
        unsigned int rsv_8:1;
        unsigned int invlpg_exiting:1;
        unsigned int mwait_exiting:1;
        unsigned int rdpmc_exiting:1;
        unsigned int rdtsc_exiting:1;
        unsigned int rsv_13_14:2;
        unsigned int cr3_load_exiting:1;
        unsigned int cr3_store_exiting:1;
        unsigned int rsv_17_18:2;
        unsigned int cr8_load_exiting:1;
        unsigned int cr8_store_exiting:1;
        unsigned int use_tpr_shadow:1;
        unsigned int nmi_window_exiting:1;
        unsigned int mov_dr_exiting:1;
        unsigned int unconditional_io_exiting:1;
        unsigned int use_io_bitmaps:1;
        unsigned int rsv_26:1;
        unsigned int monitor_trap_flag:1;
        unsigned int use_msr_bitmaps:1;
        unsigned int monitor_exiting:1;
        unsigned int pause_exiting:1;
        unsigned int activate_secondary_controls:1; };
    unsigned int val;
} primary_cpu_based_execution_controls_t;

typedef union __attribute__((packed)) {
    struct __attribute__((packed)) {
        unsigned int rsv_0_1:2;
        unsigned int save_dbg_controls:1;
        unsigned int rsv_3_8:6;
        unsigned int host_addr_space_size:1;
        unsigned int rsv_10_11:2;
        unsigned int load_ia32_perf_global_ctrl:1;
        unsigned int rsv_13_14:2;
        unsigned int acknowledge_interrupt:1;
        unsigned int rsv_16_17:2;
        unsigned int save_ia32_pat:1;
        unsigned int load_ia32_pat:1;
        unsigned int save_ia32_efer:1;
        unsigned int load_ia32_efer:1;
        unsigned int save_preemption_timer:1;
        unsigned int clear_ia32_bndcfgs:1;
        unsigned int conceal_vm_exits:1;
        unsigned int rsv_25_31:7; };
    unsigned int val;
} vm_exit_controls_t;

typedef union __attribute__((packed)) {
    struct __attribute__((packed)) {
        unsigned int rsv_0_1:2;
        unsigned int load_dbg_controls:1;
        unsigned int rsv_3_8:6;
        unsigned int ia_32e_mode_guest:1;
        unsigned int entry_to_smm:1;
        unsigned int deactivate_dual_monitor_treatment:1;
        unsigned int rsv_12:1;
        unsigned int load_ia32_perf_global_ctrl:1;
        unsigned int load_ia32_pat:1;
        unsigned int load_ia32_efer:1;
        unsigned int load_ia32_bndcfgs:1;
        unsigned int conceal_vm_entries:1;
        unsigned int rsv_18_31:14; };
    unsigned int val;
} vm_entry_controls_t;

and an enum of VMCS field encodings (unused fields omitted due to stackoverflow size constraints):

enum vmcs_encodings {
    GUEST_ES_SELECTOR =             0x00000800,
    GUEST_CS_SELECTOR =             0x00000802,
    GUEST_SS_SELECTOR =             0x00000804,
    GUEST_DS_SELECTOR =             0x00000806,
    GUEST_FS_SELECTOR =             0x00000808,
    GUEST_GS_SELECTOR =             0x0000080a,
    GUEST_LDTR_SELECTOR =           0x0000080c,
    GUEST_TR_SELECTOR =             0x0000080e,

    HOST_ES_SELECTOR =              0x00000c00,
    HOST_CS_SELECTOR =              0x00000c02,
    HOST_SS_SELECTOR =              0x00000c04,
    HOST_DS_SELECTOR =              0x00000c06,
    HOST_FS_SELECTOR =              0x00000c08,
    HOST_GS_SELECTOR =              0x00000c0a,
    HOST_TR_SELECTOR =              0x00000c0c,


    GUEST_IA32_DEBUGCTL_F =         0x00002802,
    GUEST_IA32_DEBUGCTL_H =         0x00002803,
    GUEST_IA32_PAT_F =              0x00002804,
    GUEST_IA32_PAT_H =              0x00002805,
    GUEST_IA32_EFER_F =             0x00002806,
    GUEST_IA32_EFER_H =             0x00002807,
    
    HOST_IA32_PAT_F =               0x00002c00,
    HOST_IA32_PAT_H =               0x00002c01,
    HOST_IA32_EFER_F =              0x00002c02,
    HOST_IA32_EFER_H =              0x00002c03,
    
    PIN_BASED_X_CTLS =              0x00004000,
    PRIMARY_CPU_BASED_X_CTLS =      0x00004002,
    EXIT_CTLS =                     0x0000400c,
    ENTRY_CTLS =                    0x00004012,
    SECONDARY_CPU_BASED_X_CTLS =    0x0000401e,
    
    VM_INSTRUCTION_ERROR =          0x00004400,
    EXIT_REASON =                   0x00004402,

    GUEST_ES_LIMIT =                0x00004800,
    GUEST_CS_LIMIT =                0x00004802,
    GUEST_SS_LIMIT =                0x00004804,
    GUEST_DS_LIMIT =                0x00004806,
    GUEST_FS_LIMIT =                0x00004808,
    GUEST_GS_LIMIT =                0x0000480a,
    GUEST_LDTR_LIMIT =              0x0000480c,
    GUEST_TR_LIMIT =                0x0000480e,
    GUEST_GDTR_LIMIT =              0x00004810,
    GUEST_IDTR_LIMIT =              0x00004812,
    
    GUEST_ES_ACCESS_RIGHTS =        0x00004814,
    GUEST_CS_ACCESS_RIGHTS =        0x00004816,
    GUEST_SS_ACCESS_RIGHTS =        0x00004818,
    GUEST_DS_ACCESS_RIGHTS =        0x0000481a,
    GUEST_FS_ACCESS_RIGHTS =        0x0000481c,
    GUEST_GS_ACCESS_RIGHTS =        0x0000482e,
    GUEST_LDTR_ACCESS_RIGHTS =      0x00004820,
    GUEST_TR_ACCESS_RIGHTS =        0x00004822,
    

    GUEST_IA32_SYSENTER_CS =        0x0000482a,
    HOST_IA32_SYSENTER_CS =         0x00004c00,

    EXIT_QUALIFICATION =            0x00006400,
    
    GUEST_CR0 =                     0x00006800,
    GUEST_CR3 =                     0x00006802,
    GUEST_CR4 =                     0x00006804,
    
    GUEST_ES_BASE =                 0x00006806,
    GUEST_CS_BASE =                 0x00006808,
    GUEST_SS_BASE =                 0x0000680a,
    GUEST_DS_BASE =                 0x0000680c,
    GUEST_FS_BASE =                 0x0000680e,
    GUEST_GS_BASE =                 0x00006810,
    GUEST_LDTR_BASE =               0x00006812,
    GUEST_TR_BASE =                 0x00006814,
    GUEST_GDTR_BASE =               0x00006816,
    GUEST_IDTR_BASE =               0x00006818,
    
    GUEST_DR7 =                     0x0000681a,
    GUEST_RSP =                     0x0000681c,
    GUEST_RIP =                     0x0000681e,
    GUEST_RFLAGS =                  0x00006820,
    GUEST_IA32_SYSENTER_ESP =       0x00006824,
    GUEST_IA32_SYSENTER_EIP =       0x00006826,

    HOST_CR0 =                      0x00006c00,
    HOST_CR3 =                      0x00006c02,
    HOST_CR4 =                      0x00006c04,
    
    HOST_FS_BASE =                  0x00006c06,
    HOST_GS_BASE =                  0x00006c08,
    HOST_TR_BASE =                  0x00006c0a,
    HOST_GDTR_BASE =                0x00006c0c,
    HOST_IDTR_BASE =                0x00006c0e,
    
    HOST_IA32_SYSENTER_ESP =        0x00006c10,
    HOST_IA32_SYSENTER_EIP =        0x00006c12,
    HOST_RSP =                      0x00006c14,
    HOST_RIP =                      0x00006c16 };

I have omitted a great deal of code; if people would like to see them let me know. In any case, for broader context, my other code ensures that all allocated memory is writeback cacheable, executes vmclear on the vmcs region before initializing, and uses linux's get_cpu() and put_cpu() utilities to ensure that all of VMM code runs on a single processor core.

I am quite confident that all control fields and all host area fields are initialized correctly, both because the VM-exit handler is called and behaves correctly and because the Intel manual states that control field checks and host area field checks are performed prior to guest area field checks. Hence, I don't believe that the error lies in any data shared by the host and guest areas – for instance, I imagine the controls registers are recorded correct.

By this line of thinking I thought that maybe the error might be in the segment access rights fields, and indeed initially I had the issue that the reserved access rights bits are all ones in the GDT rather than the zeroes required in the VMCS checks. However, even after fixing this, the problem persisted, and I am at a loss.

I have gone through line-by-line and verified (a) that every single VMCS field encoding is correct, (b) that each call of EC_VMWRITE has the correct data and field encoding as its arguments, and (c) that the data dumped above conforms to the checks given in section 26.3 of Volume 3C of the Intel software developers' manual. Clearly I am however missing something, so I am hoping that a fresh pair of eyes will be able to spot some error in the data I have passed to the guest state. Any help, including just a pointer in a direction worth investigating, would be greatly appreciated.

c
linux
assembly
x86-64
virtualization
asked on Stack Overflow Aug 10, 2020 by Atticus Stonestrom • edited Aug 10, 2020 by Atticus Stonestrom

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0