Bonding not working properly (CentOS 5.4, Intel 10G, 802.3ad)

4

We've configured network bonding using a Intel Network Adapter X540 with two ports. Both ports are connected to a Brocade switch with a configured LACP trunk. Everthing seems to work fine; but when we physical disconnect both ports, the status of the bonding interface is still up! The link status should be down, shouldn't it?We are not able to ping or connect to this interface; after connecting one or both slave ports everything is fine.

Any ideas?

[root@er ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 2
    Actor Key: 0
    Partner Key: 1
    Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth2
MII Status: down
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:37:41:28
Aggregator ID: 1

Slave Interface: eth3
MII Status: down
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:37:41:29
Aggregator ID: 1

[root@er ~]# ethtool bond0
Settings for bond0:
    Link detected: yes

[root@er ~]# ethtool eth2
Settings for eth2:
    Supported ports: [ FIBRE ]
    Supported link modes:   1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Full
                            10000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: Unknown! (65535)
    Duplex: Unknown! (255)
    Port: FIBRE
    PHYAD: 0
    Transceiver: external
    Auto-negotiation: on
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000007 (7)
    Link detected: no

[root@er ~]# ethtool eth3
Settings for eth3:
    Supported ports: [ FIBRE ]
    Supported link modes:   1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Full
                            10000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: Unknown! (65535)
    Duplex: Unknown! (255)
    Port: FIBRE
    PHYAD: 0
    Transceiver: external
    Auto-negotiation: on
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000007 (7)
    Link detected: no

[root@er ~]# cat /etc/modprobe.conf
...
alias eth2 ixgbe
alias eth3 ixgbe
alias bond0 bonding
options bond0 mode=4 miimon=100

[root@er ~]# modinfo ixgbe
filename:       /lib/modules/2.6.18-164.el5/kernel/drivers/net/ixgbe/ixgbe.ko
version:        3.11.33
license:        GPL
description:    Intel(R) 10 Gigabit PCI Express Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     739E88C99BF8038F878AE91
...

[root@er ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
IPADDR=192.168.100.23
NETMASK=255.255.255.0
NETWORK=192.168.100.0
GATEWAY=192.168.100.1
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
IPV6INIT=no
[root@er ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
HWADDR=90:e2:ba:37:41:28
[root@er ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth3
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
HWADDR=90:e2:ba:37:41:29

[root@er ~]# uname -a
Linux er.dve.intern 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

[root@er ~]# tail /var/log/messages
Feb 25 18:11:42 er kernel: ixgbe 0000:0a:00.0: eth2: NIC Link is Down
Feb 25 18:11:42 er kernel: bonding: bond0: link status definitely down for interface eth2, disabling it
Feb 25 18:11:43 er kernel: ixgbe 0000:0a:00.1: eth3: NIC Link is Down
Feb 25 18:11:43 er kernel: bonding: bond0: link status definitely down for interface eth3, disabling it
Feb 25 18:11:43 er kernel: bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
bonding
asked on Server Fault Feb 25, 2013 by Martin Markert • edited Feb 25, 2013 by slm

2 Answers

1

Your output shows that ethtool thinks both of your interfaces are down (as does the bonding module). Google "bonding.txt" and read the kernel's bonding documentation, particularly section 7 (and 7.3) on link monitoring. Since it appears ethtool can't determine if your interface is really up or not, and therefore the bonding driver can't either, perhaps trying with 'use_carrier=0' will help? Or trying the arp monitoring mode instead of miimon?

EDIT: My apologies, I think I read your question incorrectly... it looks like your command output is from the time period that you had your NICs unplugged, and you're wondering why bond0 still says it's up? I don't know of anything helpful offhand for that. I still recommend bonding.txt as the bible for all things "linux+bonding" though :)

answered on Server Fault Feb 25, 2013 by Mark R • edited Feb 25, 2013 by Mark R
1

You're right, when both slaves are down, ethtool bond0 should report Link detected: no.

Try the latest kernel. There have been countless updates to bonding and ixgbe since 2.6.18-164 was released way back in 2009. A basic link-state correction like this is probably fixed by now.

I'd also suggest sticking with the in-kernel driver instead of compiling the upstream driver, unless you really have a specific reason to do so.

answered on Server Fault May 20, 2013 by suprjami

User contributions licensed under CC BY-SA 3.0