ESXi :: vmxnet3 vNIC and Linux kernel errors

0

a long shot, but figured I'd give here a try (no solution on VMware community forum).

In a Linux guest (CentOS 5.7 64-bit) with vmxnet3 vNIC we are getting a few hundred kernel errors per day on primary eth0, DMZ NIC, which handles majority of network traffic (eth1 & eth2 perform backups and other non-frequent network activity).

All 3 NICs have vmxnet3 as their adapter type, but the kernel errors only occur on eth0, the only NIC with public exposure (via Cisco ASA NAT'd public IPs).

Sample log entry:

Nov  2 17:49:40 localhost kernel: eth0: tq error 0x80000000
Nov  2 17:49:40 localhost kernel: eth0: resetting
Nov  2 17:49:40 localhost kernel: eth0: intr type 2, mode 0, 1 vectors allocated
Nov  2 17:49:40 localhost kernel: eth0: NIC Link is Up 10000 Mbps

The entries are disconcerting given that eth0 went down yesterday and had to be ifup'd (although new server has been up for 2 weeks without issue otherwise).

Going to downgrade to vmxnet2 in the AM and see if that resolves the issue, but for sake of myself and future sufferers of this issue, I'll leave this out there -- every problem at some point has a solution ;-)

vmware-esxi
linux-kernel
asked on Server Fault Nov 2, 2011 by virtualeyes • edited Nov 2, 2011 by voretaq7

2 Answers

2

Just some guesses.

You also might try using the e1000 driver instead of the vmxnet3. It's limitation is 1G MBits but it might be a good backup test.

Just a thought are you on the current level of VMWare Tools on the host system? You might have to re-install the VMWare Tools after a kernel upgrade.

Is there the possibility of an actual ethernet h/w error in the ESX host itself?

Is the OS driver/kernel up-to-date?

 Linux hostname 2.6.18-274.7.1.el5 #1 SMP Thu Oct 20 16:21:01 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
answered on Server Fault Nov 2, 2011 by mdpc
1

**** Update 2 ****

The KB patch to update 2 does work, but you have to disable TSO (KB says that is only required for esxi 4.1 update 1 or less). So, ok, it works, but is it necessary in a host with 4X gigabit NICs and local SCSI disks? Probably not...

**** Update 1 ****

VMware release update 2 for ESXi 4.1 that apparently solves this issue > esxi-update2

Just found it and start of business day is already here; will try tomorrow in the early hours and post back results...

**** Original **** As I mentioned ESXi host sits behind a Cisco ASA.

Affected Linux guest uses a plesk-like control panel which has APF software firewall enabled. Having already shutdown APF, I assumed software firewall was not the culprit. Turns out that shutting down APF does not flush iptables rule sets.

Restarted the VM with chkconfig apf --off and voila, eth0 kernel errors gone ;-)

Would be nice to find the actual cause (i.e. I'd actually like APF enabled as the ASA lacks hardware resources [limited cpu/memory] to handle large deny lists). I'll do some more testing early AM tomorrow and see if I can find what APF does not like about inbound ASA NAT'd traffic.

In any case, having spent $5K on a virtualization server, taking advantage of the latest & greatest technology helps justify the expense (even if in reality there is likely zero performance gain between e1000 and vmxnet3 for this modestly loaded host).

To sum up: vmxnet3 vNIC works just fine on a Dell R610 host running a CentOS 5.7 64-bit guest. TBD is why ASA + ESXi + APF do not play well together...

answered on Server Fault Nov 3, 2011 by virtualeyes • edited Nov 9, 2011 by virtualeyes

User contributions licensed under CC BY-SA 3.0