Diagnosing a network connectivity issue with a new network switch, new network cards, one new and two old servers

0

I'm new to managing networks and I've reached the limit of what I can do on my own by just using google. I tried to find similar questions but didn't find anything like what I'm dealing with at the moment.

Important background info: I'm setting up a 10 gigabit network between two existing servers and one new server. I've installed a new network switch (Netgear Prosafe XS712T), and new network cards in the two old servers (Intel X540T2 10Gb Adapter). The new server has the same network card. All servers are running some flavor of linux (one is running debian, one ubuntu, and one fedora... not my fault...). All of this setup is underneath (exists within) the department network at the university where I work. The speed of that network is 1 gigabit.

What I am trying to do: The goal is to allow high-speed data transfers between our new machine and our old machines, which makes it practical for us to set up a shared directory on the new machine which will mirror to the old ones. We are working with large data sets which can be ~5-10 GB in size.

Current Status: Network cards are installed on all machines. All machines can access the internet. All machines can ping each other. All machines are able to transfer files to and from each other using scp. Driver for network cards (ixgbe) is installed from Intel website, replaced native ixgbe driver. lsmod shows driver module running on all machines.

Network switch appears to function properly* on the new machine. No appreciable delays in internet access, transfers, connections to other machines, etc. For old machines, there appears to be limited-speed/intermittent internet access at times.

*no high speed, though. using scp to transfer files from any one of these machines to any other results in network speeds that match what i would see before i installed the new switch and cards. (~65-70MB/s)

Problems: As stated above, connection speeds have not changed since installation of new network cards/switch. Connections/pings/transfers are often sluggish to start.

Example; a ping from one old server to the other... very sluggish:

mrfox:# ping darjeeling
PING darjeeling.xxxxx (xxx.xx.x.xx) 56(84) bytes of data.
64 bytes from darjeeling.xxxxx (xxx.xx.x.xx): icmp_req=1 ttl=64 time=0.124 ms
64 bytes from darjeeling.xxxxx (xxx.xx.x.xx): icmp_req=2 ttl=64 time=0.116 ms
64 bytes from darjeeling.xxxxx (xxx.xx.x.xx): icmp_req=3 ttl=64 time=0.116 ms
64 bytes from darjeeling.xxxxx (xxx.xx.x.xx): icmp_req=4 ttl=64 time=0.110 ms
^C64 bytes from darjeeling.xxxxx (xxx.xx.x.xx): icmp_req=5 ttl=64 time=0.116 ms

--- darjeeling.xxxxx ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, **time 32034ms**
rtt min/avg/max/mdev = 0.110/0.116/0.124/0.010 ms

The same but from the other direction... not sluggish at all:

Darjeeling:~$ ping mrfox
PING mrfox 56(84) bytes of data.
64 bytes from mrfox : icmp_req=1 ttl=64 time=0.103 ms
64 bytes from mrfox : icmp_req=2 ttl=64 time=0.097 ms
64 bytes from mrfox : icmp_req=3 ttl=64 time=0.099 ms
64 bytes from mrfox : icmp_req=4 ttl=64 time=0.100 ms
64 bytes from mrfox : icmp_req=5 ttl=64 time=0.078 ms
64 bytes from mrfox : icmp_req=6 ttl=64 time=0.099 ms
64 bytes from mrfox : icmp_req=7 ttl=64 time=0.095 ms
^C
--- mrfox  ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, **time 5998ms**
rtt min/avg/max/mdev = 0.078/0.095/0.103/0.014 ms

Ping from old server to new server... no sluggishness:

mrfox# ping xxx.xx.x.xxx
---  ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 4998ms

Traceroute behaves differently when connecting to one machine in particular:

[moonrise]# traceroute darjeeling
traceroute to darjeeling  30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
etc etc
30  * * *

But behaves normally (i think?) when connecting to any other machine:

[moonrise]# traceroute mrfox
traceroute to mrfox ( ), 30 hops max, 60 byte packets
 1  mrfox. ( )  1.799 ms  1.755 ms  1.673 ms

My Questions: It seems like the connection between our machines isn't running at 10Gb like it should. I think it is only running at 1Gb, like the rest of the department's network. What other commands can I use to test the connections between the servers?

If I'm right, what are the next steps in figuring out how to get the machines to interact properly?

How can I determine what is causing the intermittent sluggishness of HTTP connections that I've observed?

Thanks in advance for any responses. I apologize if I've left out any really crucial data here. I will watch the space below for any requests for additional info/terminal output.

Additional ethtool report for new machine (old machines report the same):

ethtool p8p1
Settings for p8p1:
    Supported ports: [ TP ]
    Supported link modes:   100baseT/Full 
                            1000baseT/Full 
                            10000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  100baseT/Full 
                            1000baseT/Full 
                            10000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 10000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 0
    Transceiver: external
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes

ping -n report from new to old server:

[ moonrise ]# ping -n 137.82.xx.xx
PING 137.82.x.xx (137.82.xx.xx) 56(84) bytes of data.
64 bytes from 137.82.xx.xx: icmp_seq=1 ttl=64 time=0.205 ms
64 bytes from 137.82.xx.xx: icmp_seq=2 ttl=64 time=0.129 ms
64 bytes from 137.82.xx.xx: icmp_seq=3 ttl=64 time=0.131 ms
64 bytes from 137.82.xx.xx: icmp_seq=4 ttl=64 time=0.136 ms
64 bytes from 137.82.xx.xx: icmp_seq=5 ttl=64 time=0.157 ms
64 bytes from 137.82.xx.xx: icmp_seq=6 ttl=64 time=0.131 ms
^C
--- 137.82.xx.xx ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.129/0.148/0.205/0.028 ms

previously sluggish ping command with -n flag and no DNS:

root@mrfox:# ping -n 137.82.xx.xx
PING 137.82.4.97 (137.82.xx.xx) 56(84) bytes of data.
64 bytes from 137.82.xx.xx: icmp_req=1 ttl=64 time=0.139 ms
64 bytes from 137.82.xx.xx: icmp_req=2 ttl=64 time=0.112 ms
64 bytes from 137.82.xx.xx: icmp_req=3 ttl=64 time=0.111 ms
64 bytes from 137.82.xx.xx: icmp_req=4 ttl=64 time=0.117 ms
64 bytes from 137.82.xx.xx: icmp_req=5 ttl=64 time=0.114 ms
^C
--- 137.82.xx.xx ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.111/0.118/0.139/0.015 ms
linux
networking
asked on Server Fault Sep 23, 2013 by ovon • edited Sep 23, 2013 by ovon

1 Answer

0

I am having the same issue as you are (we have Windows servers with Intel X540T2 10Gb Adapter and Netgear Prosafe XS712T switch). I have contacted the NetGear and Intel support and they told me it's the compatible issue. It is hard for me to believe that. I am still searching for the solution to this. If you can, please keep me update if you have the answer to this. You can email at co2427@yahoo.com. Thank you.

answered on Server Fault Sep 24, 2013 by John Hsu

User contributions licensed under CC BY-SA 3.0