I am working a Linux kernel module (VMM) to test Intel VMX, to run a self-made VM (The VM starts in real-mode, then switches to 32bit protected mode with Paging enabled).
The VMM is configured to NOT use rdtsc exit, and use rdtsc offsetting.
Then, the VM runs rdtsc to check the performance, like below.
static void cpuid(uint32_t code, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx) {
__asm__ volatile(
"cpuid"
:"=a"(*eax),"=b"(*ebx),"=c"(*ecx), "=d"(*edx)
:"a"(code)
:"cc");
}
uint64_t rdtsc(void)
{
uint32_t lo, hi;
// RDTSC copies contents of 64-bit TSC into EDX:EAX
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
void i386mode_tests(void)
{
u32 eax, ebx, ecx, edx;
u32 i = 0;
asm ("mov %%cr0, %%eax\n"
"mov %%eax, %0 \n" : "=m" (eax) : :);
my_printf("Guest CR0 = 0x%x\n", eax);
cpuid(0x80000001, &eax, &ebx, &ecx, &edx);
vm_tsc[0]= rdtsc();
for (i = 0; i < 100; i ++) {
rdtsc();
}
vm_tsc[1]= rdtsc();
my_printf("Rdtsc takes %d\n", vm_tsc[1] - vm_tsc[0]);
}
The output is something like this,
Guest CR0 = 0x80050033
Rdtsc takes 2742
On the other hand, I make a host application to do the same thing, like above
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
static void cpuid(uint32_t code, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx) {
__asm__ volatile(
"cpuid"
:"=a"(*eax),"=b"(*ebx),"=c"(*ecx), "=d"(*edx)
:"a"(code)
:"cc");
}
uint64_t rdtsc(void)
{
uint32_t lo, hi;
// RDTSC copies contents of 64-bit TSC into EDX:EAX
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
int main(int argc, char **argv)
{
uint64_t vm_tsc[2];
uint32_t eax, ebx, ecx, edx, i;
cpuid(0x80000001, &eax, &ebx, &ecx, &edx);
vm_tsc[0]= rdtsc();
for (i = 0; i < 100; i ++) {
rdtsc();
}
vm_tsc[1]= rdtsc();
printf("Rdtsc takes %ld\n", vm_tsc[1] - vm_tsc[0]);
return 0;
}
It outputs followings,
Rdtsc takes 2325
Running above two codes in 40 iterations to get the average value as followings,
avag(VM) = 3188.000000
avag(host) = 2331.000000
The performance difference can NOT be ignored, when running the codes in VM and in host. It is NOT expected.
My understanding is, using TSC offsetting + no RDTSC exit, there should be little difference in rdtsc, running in VM and host.
Here are VMCS fields,
0xA501E97E = control_VMX_cpu_based
0xFFFFFFFFFFFFFFF0 = control_CR0_mask
0x0000000080050033 = control_CR0_shadow
In the last level of EPT PTEs, bit[5:3] = 6 (Write Back), bit[6] = 1. EPTP[2:0] = 6 (Write Back)
I tested in bare-metal, and in VMware, I got the similar results.
I am wondering if there is anything I missed in this case.
User contributions licensed under CC BY-SA 3.0