Bonded Gigabit Interfaces capped at ~500mbps each

3

This issue has been driving me nutty for days now! I recently bonded the eth0/eth1 interfaces on a few linux servers into bond1 with the following configs(same on all systems):

DEVICE=bond0
ONBOOT=yes
BONDING_OPTS="miimon=100 mode=4 xmit_hash_policy=layer3+4 
lacp_rate=1" 
TYPE=Bond0
BOOTPROTO=none

DEVICE=eth0
ONBOOT=yes
SLAVE=yes
MASTER=bond0
HOTPLUG=no
TYPE=Ethernet
BOOTPROTO=none

DEVICE=eth1
ONBOOT=yes
SLAVE=yes
MASTER=bond0
HOTPLUG=no
TYPE=Ethernet
BOOTPROTO=none

Here you can see the bonding status: Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
    Aggregator ID: 3
    Number of ports: 2
    Actor Key: 17
    Partner Key: 686
    Partner Mac Address: d0:67:e5:df:9c:dc

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:90:c9:95:74
Aggregator ID: 3
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:90:c9:95:75
Aggregator ID: 3
Slave queue ID: 0

And the Ethtool outputs:

Settings for bond0:
Supported ports: [ ]
Supported link modes:   Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 2000Mb/s
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Link detected: yes

Settings for eth0:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: Symmetric
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes

Settings for eth1:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: Symmetric
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: pumbg
    Wake-on: d
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes

The servers are both connected to the same Dell PCT 7048 switch, with both ports for each server added to its' own dynamic LAG and set to access mode. Everything looks ok, right? And yet, here are the results of attempting iperf tests from one server to the other, with 2 threads:

    ------------------------------------------------------------
Client connecting to 172.16.8.183, TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 172.16.8.180 port 14773 connected with 172.16.8.183 port     5001
[  3] local 172.16.8.180 port 14772 connected with 172.16.8.183 port     5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   561 MBytes   471 Mbits/sec
[  3]  0.0-10.0 sec   519 MBytes   434 Mbits/sec
[SUM]  0.0-10.0 sec  1.05 GBytes   904 Mbits/sec

Clearly both ports are being used, but not at anywhere close to 1Gbps - which is what they worked at individually before bonding them. I've tried all sorts of different bonding modes, xmit hash types, mtu sizes, etc. etc. but just cannot get the individual ports to exceed 500 Mbits/sec.....it's almost as if the Bond itself is being limited to 1G somewhere! Does anyone have any ideas?

Addition 1/19: Thanks for the comments and questions, I'll try to answer them here as I am still very interested in maximizing the performance of these servers. First, I cleared the interface counters on the Dell switch and then let it serve production traffic for a bit. Here are the counters for the 2 interfaces making up the bond of the sending server:

  Port      InTotalPkts      InUcastPkts      InMcastPkts      
InBcastPkts
--------- ---------------- ---------------- ---------------- --------
--------
Gi1/0/9           63113512         63113440               72                
0

  Port      OutTotalPkts     OutUcastPkts     OutMcastPkts     
OutBcastPkts
--------- ---------------- ---------------- ---------------- --------
--------
Gi1/0/9           55453195         55437966             6075             
9154

  Port      InTotalPkts      InUcastPkts      InMcastPkts      
InBcastPkts
--------- ---------------- ---------------- ---------------- --------
--------
Gi1/0/44          61904622         61904552               48               
22

  Port      OutTotalPkts     OutUcastPkts     OutMcastPkts     
OutBcastPkts
--------- ---------------- ---------------- ---------------- --------
--------
Gi1/0/44          53780693         53747972               48            
32673

It seems like the traffic is being perfectly load balanced - but the bandwidth graphs still show almost exactly 500mbps per interface, when rx and tx are combined:

Gi1/0/9

Gi1/0/44

Port-Channel 11

I can also say for a certainty that, when it is serving production traffic, it is constantly pushing for more bandwidth and is communicating with multiple other servers at the same time.

Edit #2 1/19: Zordache, you made me think that maybe the Iperf tests were being limited by the receiving side only using 1 port and there only 1 interface, so I ran 2 instances of server1 concurrently and ran "iperf -s" on server2 and server3. I then ran Iperf tests from server1 to servers 2 and 3 at the same time:

iperf -c 172.16.8.182 -P 2
------------------------------------------------------------
Client connecting to 172.16.8.182, TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 172.16.8.225 port 2239 connected with 172.16.8.182 port 
5001
[  3] local 172.16.8.225 port 2238 connected with 172.16.8.182 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   234 MBytes   196 Mbits/sec
[  3]  0.0-10.0 sec   232 MBytes   195 Mbits/sec
[SUM]  0.0-10.0 sec   466 MBytes   391 Mbits/sec

iperf -c 172.16.8.183 -P 2
------------------------------------------------------------
Client connecting to 172.16.8.183, TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  3] local 172.16.8.225 port 5565 connected with 172.16.8.183 port 
5001
[  4] local 172.16.8.225 port 5566 connected with 172.16.8.183 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   287 MBytes   241 Mbits/sec
[  4]  0.0-10.0 sec   292 MBytes   244 Mbits/sec
[SUM]  0.0-10.0 sec   579 MBytes   484 Mbits/sec

Both SUMs added still won't go over 1Gbps! As for your other question, my port-channels are set up with just the following 2 lines:

hashing-mode 7
switchport access vlan 60

Hashing-mode 7 is Dell's "Enhanced Hashing". It doesn't say specifically what it does, but I have tried various combos of the other 6 modes, which are:

Hash Algorithm Type
1 - Source MAC, VLAN, EtherType, source module and port Id
2 - Destination MAC, VLAN, EtherType, source module and port Id
3 - Source IP and source TCP/UDP port
4 - Destination IP and destination TCP/UDP port
5 - Source/Destination MAC, VLAN, EtherType, source MODID/port
6 - Source/Destination IP and source/destination TCP/UDP port
7 - Enhanced hashing mode

If you have any suggestions, I am happy to try the other modes again, or change the configurations on my port-channel.

linux
bonding
dell-powerconnect
lacp
asked on Server Fault Jan 19, 2018 by Jeremy • edited Jan 24, 2018 by JonathanDavidArndt

2 Answers

3

On the computer your bond is using the hash policy Transmit Hash Policy: layer3+4, basically meaning that the interface used for a given connection is based on the ip/port.

Your iperf test is between two system, and iperf uses a single port. So all the iperf traffic is likely to be limited to a single member of the bonded interface.

I am not sure what you are seeing that makes you think both interfaces are being used, or that half of it is being handled by each interface. Iperf is just reporting the results per thread. Not per interface. It would be more interesting to look at the interface counters on the switch.

You mentioned playing around with different hash modes. Since you are connecting to a switch you also need to make sure you change the hash modes on the switch. The configuration on your computer only applies to transmitted packets. You need to also configure the hashing mode on the switch (if that is even an option with your hardware).

Bonding just isn't that useful when used between two systems. Bonding doesn't give you the full bandwidth of both interfaces, it just lets you have some connections use one interface, and some use the other. There are some modes that can help a bit between two systems, but it is a 25-50% improvement at best. You almost never get the full capacity of both interfaces.

answered on Server Fault Jan 19, 2018 by Zoredache
1

The only bonding mode that can increase the throughput of a single TCP connection is balance-rr (or mode 0). This bonding modes actually "stripes" your outgoing packets on 2 (or more) available interfaces. However it has its own pitfalls:

  • correct packet ordering is not guaranteed;
  • it only affect outgoing packets;
  • it does not always play safe with switches (which can detect it as a form of MAC poisoning/flapping);
  • it is not a standard LACP mode.

From Linux kernel documentation:

balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput. This comes at a cost, however: the striping generally results in peer systems receiving packets out of order, causing TCP/IP's congestion control system to kick in, often by retransmitting segments.

For an actual example on how to use balance-rr, read here

Back to your setup: as you are using 802.3ad/mode 4 (LACP), your system can not use multiple interfaces for a single connection. By opening a single TCP or UDP stream,iperf does not benefit from LACP at all. On the other side, a multi-session-aware protocol (eg: SMB 3.0+) can fully use your additional interfaces.

answered on Server Fault Jan 19, 2018 by shodanshok

User contributions licensed under CC BY-SA 3.0