On our VPS we face connection issues with IPv6, hopefully someone can help to debug the issue.
Pings fail at first and succeed later:
2020-06-01 23:20:55 <user>@<host>:~# ping -6 google.com
PING google.com(ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e)) 56 data bytes
From <host>.com (<ip>) icmp_seq=1 Destination unreachable: Address unreachable
...
From <host>.com (<ip>) icmp_seq=6 Destination unreachable: Address unreachable
64 bytes from ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e): icmp_seq=7 ttl=54 time=14.0 ms
...
64 bytes from ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e): icmp_seq=13 ttl=54 time=12.1 ms
--- google.com ping statistics ---
13 packets transmitted, 7 received, +6 errors, 46% packet loss, time 12174ms
rtt min/avg/max/mdev = 12.151/12.683/14.069/0.767 ms
As can be seen the DNS resolving succeeds immediately, that is not the problem. The first outgoing pings throw an error message, from the 7th on it succeeds. How long it takes before the first ping succeeds varies.
curl
switches to IPv4 immediately:
2020-06-01 23:21:16 <user>@<host>:~# curl -vIL google.com
* Rebuilt URL to: google.com/
* Trying 2a00:1450:400e:807::200e...
* TCP_NODELAY set
* Trying 172.217.17.142...
* TCP_NODELAY set
* Connected to google.com (172.217.17.142) port 80 (#0)
...
wget
tries a bid longer to connect, and, sometimes succeeds, sometimes fails and switches to IPv4 as well:
2020-06-02 00:49:11 <user>@<host>:~# wget --spider google.com
Spider mode enabled. Check if remote file exists.
--2020-06-02 00:51:01-- http://google.com/
Resolving google.com (google.com)... 2a00:1450:400e:807::200e, 172.217.17.142
Connecting to google.com (google.com)|2a00:1450:400e:807::200e|:80... failed: No route to host.
Connecting to google.com (google.com)|172.217.17.142|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.com/ [following]
Spider mode enabled. Check if remote file exists.
--2020-06-02 00:51:20-- http://www.google.com/
Resolving www.google.com (www.google.com)... 2a00:1450:400e:804::2004, 172.217.17.36
Connecting to www.google.com (www.google.com)|2a00:1450:400e:804::2004|:80... failed: No route to host.
Connecting to www.google.com (www.google.com)|172.217.17.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
This happens btw regardless of host/IP. Default route is there, the interface has a link-local address and a global IPv6 address, assigned via DHCPv6:
2020-06-02 00:58:25 <user>@<host>:~# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
::/64 dev eth0 proto kernel metric 256 expires 2590394sec pref medium
<ipv6> dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via <gateway> dev eth0 proto ra metric 1024 expires 194sec pref medium
2020-06-02 00:58:56 <user>@<host>:~# ip -6 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 <ipv6>/128 scope global
valid_lft forever preferred_lft forever
inet6 <LLA>/64 scope link
valid_lft forever preferred_lft forever
IPv4 connections always succeed immediately.
rdisc6
output:
2020-06-02 13:10:36 <user>@<host>:~# rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...
Hop limit : undefined ( 0x00)
Stateful address conf. : Yes
Stateful other conf. : No
Mobile home agent : No
Router preference : medium
Neighbor discovery proxy : No
Router lifetime : 1800 (0x00000708) seconds
Reachable time : unspecified (0x00000000)
Retransmit time : unspecified (0x00000000)
Source link-layer address: <MAC>
Prefix : ::/64
On-link : Yes
Autonomous address conf.: No
Valid time : 2592000 (0x00278d00) seconds
Pref. time : 604800 (0x00093a80) seconds
from fe80::<ipv6>
traceroute6
(this fails sometimes with 30 empty lines):
2020-06-02 13:14:18 <user>@<host>:~# traceroute6 google.com
traceroute to google.com (2a00:1450:400e:807::200e) from <ipv6>::142, port 33434, from port 54573, 30 hops max, 60 bytes packets
1 * * <ipv6>::1 (<ipv6>::1) 2055.792 ms
2 * 2a06:7f80::1 (2a06:7f80::1) 2055.700 ms 1.262 ms
3 ipv6.decix-dusseldorf.core1.dus1.he.net (2001:7f8:9e::1b1b:0:1) 2058.316 ms 2.655 ms 2.810 ms
4 100ge5-2.core1.ams1.he.net (2001:470:0:371::1) 4.658 ms 3.804 ms 3.865 ms
5 de-cix.fra.google.com (2001:7f8::3b41:0:1) 4.731 ms 12.465 ms 9.900 ms
6 2001:4860:0:11e1::e (2001:4860:0:11e1::e) 14.691 ms 10.691 ms 10.654 ms
7 2001:4860:0:1::1c7f (2001:4860:0:1::1c7f) 12.320 ms 11.433 ms 11.476 ms
8 2001:4860::c:4000:d9a9 (2001:4860::c:4000:d9a9) 15.681 ms 16.138 ms 14.906 ms
9 ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e) 15.327 ms 12.979 ms 12.162 ms
ip monitor
/ip mon route
show that the default route seems to be not reliably reachable and is deleted regularly after being expired, and not always recreated shortly after. These are the outputs of a few hours:
fe80::<ipv6_1> dev eth0 lladdr <mac_1> PROBE
fe80::<ipv6_1> dev eth0 lladdr <mac_1> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_1> dev eth0 lladdr <mac_1> STALE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_4> dev eth0 lladdr <mac_4> PROBE
fe80::<ipv6_4> dev eth0 lladdr <mac_4> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_4> dev eth0 lladdr <mac_4> STALE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
Deleted default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 expires -4sec pref medium
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
Deleted default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 expires -11sec pref medium
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
Deleted default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 expires -3sec pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0 router FAILED
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
<ipv4_1> dev eth0 lladdr <mac_1> PROBE
<ipv4_1> dev eth0 lladdr <mac_1> REACHABLE
<ipv4_1> dev eth0 lladdr <mac_1> STALE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
The following shows that the router is not always sending router advertisements regularly enough so that the default gateway entry expires after 1800 seconds, note the timestamp of the last PS1 prompt when interrupting tcpdump:
2020-06-03 12:26:31 <user>@<host>:/var/log# tcpdump -n -i eth0 icmp6 and ip6[40] == 134
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:45:41.290680 IP6 fe80::XXX > ff02::1: ICMP6, router advertisement, length 56
14:11:10.133781 IP6 fe80::XXX > ff02::1: ICMP6, router advertisement, length 56
^C
2 packets captured
5 packets received by filter
0 packets dropped by kernel
2020-06-03 14:58:07 <user>@<host>:/var/log#
While the first two RAs were close enough to keep the default route (although already 4 minutes before expiry), the 3rd RA is missing too long, hence the default route was lost, hence no IPv6 connections are possible anymore.
Meanwhile I can see lots of neighbor solicitation from the router, hence its ICMPv6 requests do arrive.
2020-06-03 14:56:03 <user>@<host>:/var/log# tcpdump icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:03:07.750318 IP6 fe80::XXX > ff02::YYY: ICMP6, neighbor solicitation, who has 2a06:ZZZ, length 32
15:03:08.356100 IP6 fe80::XXX > ff02::YYY: ICMP6, neighbor solicitation, who has 2a06:ZZZ, length 32
But no RAs arrive, not even when trying to force them, currently:
2020-06-03 15:03:21 <user>@<host>:/var/log# rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...
Timed out.
Timed out.
Timed out.
No response.
This fits to the above ip monitor output where probing the router often simply fails. However since I see the NDs from the router, I guess it could answer me but for some reason does not respectively ignores my NDs?
I am able to manually restore the default route permanently via:
ip -6 r add default dev eth0 via fe80::<ipv6>
While IPv6 connections are again possible with this, they usually still have a long delay or time out completely.
Note 1: You're only using DHCPv6 to obtain an address – it is not used for the default route. That's still done via SLAAC, i.e. ICMPv6 "Router Advertisement" packets.
Note 2: ip monitor
shows several different kinds of events intermixed: addresses, routes, and neighbor cache entries. You can run ip mon route
, ip mon neigh
to see them separately.
I would guess that there is a problem in between your VPS and your nearest gateway, because:
The neighbour entry for your default gateway (the IPv6 equivalent of ARP cache entry) does not successfully go into REACHABLE state – it keeps going into FAILED state, meaning your host sent several ND requests (the equivalent of ARP queries) to renew the cache entry but didn't receive any response.
Neighbor discovery, just like ARP for IPv4, is the absolute bare minimum for a functioning IPv6 network.
Expiry for the default route ::/0
is reset according to "Router lifetime" every time a SLAAC advertisement is received. In your case, the advertised lifetime is 1800 seconds, so the router should repeat the advertisement at least every 900 seconds so the default route never goes below half its lifetime.
But as you can see from ip -6 route
output, your ::/0 route was only 194 seconds from expiry. This either means the router's timers are misconfigured, or its broadcast RAs are just not reaching you for whatever reason – as a result, you keep losing the default route.
There's one thing common to both above issues: ND and SLAAC are both using ICMPv6 multicasts, so very carefully check whether your firewall isn't imposing strict rate limits on incoming Router Advertisements or Neighbor Adverts, or on multicast packets in general.
(You can use tcpdump to check whether you're receiving packets; e.g. if a RA shows up in tcpdump but fails to renew the default route then it may be your firewall's problem.)
User contributions licensed under CC BY-SA 3.0