Windows 10 drops IKEv2 VPN connection to Cisco router in exactly 60 seconds after last data exchange

2

Update 3: 22 January 2017

I've narrowed down the update that causes this problem: KB3201845 OS Build 14393.479 from December 9 2016. I went through all these updates step by step.

Looking at the changes files manifest, these files have been changed:

  • agilevpn.sys
  • vpnike.dll

Other topics I've created:

Microsoft Technet Forums, 2 other people have confirmed the same problem there.

Reddit /r/networking

Microsoft Feedback Hub

Update

  • Windows 10 pre-anniversary (version 10.0.10586) works fine

  • Windows 10 post-anniversary (version 10.0.14393) drops, so the problem is with the anniversary update.


Update 2

I can connect using two pre-anniversary Windows hosts fine

If I connect 1 anniversary Windows host to the VPN and keep it alive artificially by pinging, I cannot connect a second host to the VPN. In the router's "debug crypto ikev2", this happens:

Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):Session present in ID PAIR TREE, but absent in TUPLE TREE
Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):: Failed to add new SA into session DB
Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):Queuing IKE SA delete request reason: unknown
Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):Sending DELETE INFO message for IPsec SA [SPI: 0x3DA95352]
Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):Check for existing IPSEC SA
Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):Delete all IKE SAs
Jan 17 19:20:35.811: IKEv2:(SESSION ID = 25,SA ID = 2):Sending DELETE INFO message for IKEv2 SA [ISPI: 0x3B52B7EBA8A353B1 RSPI: 0x16F36D79AAE675A5]

I have a Cisco IOS router, 892 model, which I'm setting up IKEv2 with EAP-MSCHAPv2 as remote authentication (backed by a Windows 2012 Server Network Policy Server) and local certificate authentication. Everything works, I can connect to the VPN and ping a loopback address on the router. Windows 7 and 8.1 work fine, Android with Strongswan too.

However, on Windows 10 (10.0.14393 - fully up to date 16 jan 2017), exactly 60 seconds after the last data exchange (like a ping), Windows drops the connection. So:

  • T+0 VPN connection opened
  • T+60 VPN connection drops

  • T+0 VPN connection opened
  • T+20 1 Ping to 172.16.0.5, reply received
  • T+80 VPN connection drops

The VPN is dropped with the following message in the System event log:

Source: RasClient

Event ID: 20226

CoId={43121588-861C-447A-A510-C44C2BA86639}: The user LAPTOP-GLENN\Glenn dialed a connection named ikev2-test which has terminated. The reason code returned on termination is 829.

So I started digging and enabled RAS debugging on the client:

netsh ras diagnostics * state=enabled

The only relevant stuff I could find was in C:\Windows\tracing\rasman.log (20:11:51 is when the disconnect happens - interesting bits prefixed with "***"):

[1384] 01-16 20:11:51:216: FreeInterfaceLuidIndex: Luid = 0
[1384] 01-16 20:11:51:216: RasUpdateVpnLuidCache: Luid: 17000000000000, fAddLuid:0
[1384] 01-16 20:11:51:216: RasUpdateVpnLuidCache: Removed Luid 17000000000000 from cache
[1384] 01-16 20:11:51:216: FreeInterfaceLuidIndex: done 0
[1384] 01-16 20:11:51:216: DeallocateRouteRequestCommon: pBundle=0xa50adde0, type=0x800
[1384] 01-16 20:11:51:232: DeActivated Route , bundlehandle 0x4, prottype = 2048
[1384] 01-16 20:11:51:232: DeAllocateRoute: PI_Type=0x800, PI_AdapterName=\DEVICE\{93A76D72-2010-45BB-9096-244B06735879}, PI_Allocated=-1
[3524] 01-16 20:11:51:248: SendProtocolResultToRasman: msgid=1, hPort: 6.
*** [3524] 01-16 20:11:51:248: Setting last error for port VPN2-1 to ppp error 0x3635
[3524] 01-16 20:11:51:248: SetProtocolResultAvailableEvent: Notification handle event for port 6 is not registered.
[2640] 01-16 20:11:51:248: WorkerThread: Disconnect event signaled on port: VPN2-1
[2640] 01-16 20:11:51:248: OVEVT_DEV_STATECHANGE. pOverlapped = 0xa4611940
[2640] 01-16 20:11:51:248: onecoreuap\net\rras\ras\rasman\rasman\worker.c, 2031: Disconnecting port 6, connection 0xa6af47e0, reason 1
[2640] 01-16 20:11:51:248: Disconnecting Port 0xVPN2-1, reason 1
[2640] 01-16 20:11:51:248: DisconnectPort: Saving Bundle stats for port VPN2-1
[2640] 01-16 20:11:51:263: RevertPostConnectionActions
[2640] 01-16 20:11:51:263: RasImpersonateUser. 0x0
[2640] 01-16 20:11:51:263: DeleteCredentialsFromCredMan
[2640] 01-16 20:11:51:263: DeleteCredentialsFromCredMan Done: 0
[2640] 01-16 20:11:51:263: RasRevertToSelf. 0x0
[2640] 01-16 20:11:51:263: QueueCloseConnections: no dependent connections
[2640] 01-16 20:11:51:263: 10. Throwing away handle 0x0!
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\util.c, 2315:Setting port 6 for autoclosure...
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\util.c 2327: Disconnected Port 6, reason 1. rc=0x0
[2640] 01-16 20:11:51:263: FreeBundle: freeing pBundle=0xa50adde0
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\util.c: 2443: port 6 state chg: prev=2, new=3
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\util.c: 2459: port 6 state chg: prev=3, new=4
[2640] 01-16 20:11:51:263: 5. Notifying of disconnect on port 6
[2640] 01-16 20:11:51:263: SignalPortDisconnect: Notification handle event for port 6 is not registered.
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\util.c: 2573: port 6 async reqtype chg: prev=0, new=0
[2640] 01-16 20:11:51:263: ***** DisconnectType=1,DisconnectReason=4,pConn=0xa6af47e0,cbports=1,signaled=1,hEvent=0xffffffff,fRedial=0
[2640] 01-16 20:11:51:263: Calling DwQueueRedial
[2640] 01-16 20:11:51:263: DwQueueRedial
[2640] 01-16 20:11:51:263: DwQueueRedial returned 0x0
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\util.c, 2634: Autoclosing port 6
[2640] 01-16 20:11:51:263: PortClose: port (6). OpenInstances = 1
[2640] 01-16 20:11:51:263: Freeing the notifier list for port 6
[2640] 01-16 20:11:51:263: PortClose (6). OpenInstances = 0
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\request.c: 3845: port 6 async reqtype chg: prev=0, new=0
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\request.c: 3848: port 6 state chg: prev=4, new=4
[2640] 01-16 20:11:51:263: PortClose: Resetting PCB_OpenedUsage for port: 6.
[2640] 01-16 20:11:51:263: RemoveConnectionPort: port 6, fOwnerClose=0, pConn=0xa6af47e0, pConn->CB_Ports=0

*** [2640] 01-16 20:11:51:263: Completely disconnected connection: Reason: ERROR_LINK_FAILURE (829)
[2640] 01-16 20:11:51:263: SendSensNotification(_RAS_DISCONNECT) for 0x00040000 returns 0x00000000
[2640] 01-16 20:11:51:263: Successfully notified event(128, C:\Users\Glenn\AppData\Roaming\Microsoft\Network\Connections\Pbk\rasphone.pbk, ikev2-test) to the caller.
[2640] 01-16 20:11:51:263: SignalNetman: IRasEventNotify::RasEvent returned S_FALSE
[2640] 01-16 20:11:51:263: DwSendNotificationInternal(ENTRY_DISCONNECTED) rc=0x1
[2640] 01-16 20:11:51:263: RemoveConnectionPort: FreeConnection hconn=0x40000, pconn=0xa6af47e0, AutoClose=1
[2640] 01-16 20:11:51:263: FreeConnection: pConn=0xa6af47e0, 1
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\request.c, 3918: Clearing the autoclose flag for port 6
[2640] 01-16 20:11:51:263: fAnyConnectedPorts: 0
[2640] 01-16 20:11:51:263: SetRasmanServiceStopControl: Enabled 1
[2640] 01-16 20:11:51:263: PortClose: DisableAutoWPPTracing failed with error 0x2 
[2640] 01-16 20:11:51:263: DisconnectPort Complete
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\worker.c: 2077: port 6 state chg: prev=4, new=4
[2640] 01-16 20:11:51:263: onecoreuap\net\rras\ras\rasman\rasman\worker.c: 2081: port 6 async reqtype chg: prev=0, new=0
[1276] 01-16 20:11:51:263: The specified notification entry with cookie 2 found.
[1276] 01-16 20:11:51:263: Dequed notification entry: (128, C:\Users\Glenn\AppData\Roaming\Microsoft\Network\Connections\Pbk\rasphone.pbk, ikev2-test) from the queue.
  • Ran the same config on a Cisco CSR1000v router and the exact same thing happened.
  • Created a REG_DWORD named "InactivityIdleSeconds" under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4d36e972-e325-11ce-bfc1-08002be10318}\0012 (WAN Miniport (IKEv2)) and played with its value but that didn't change anything (see here why I tried that)
  • Played around with Dead Peer Detection but that didn't change anything either

I can't find what the criteria are for Windows 10 to consider an IKEv2 connection dead or what state changes cause Windows 10 to think the VPN has to be disconnected.

Soooo... any pointers?

Microsoft Technet: Error 829 means ERROR_LINK_FAILURE The modem (or other connecting device) was disconnected due to link failure.

Microsoft Technet: 0x3635 means 13877 ERROR_IPSEC_IKE_RPC_DELETE "Deleted via RPC call."

The Cisco IOS router configuration.

Cisco IOS router IKEv2 debug logs

Zipfile of the complete C:\Windows\tracing directory

networking
windows-10
vpn
asked on Super User Jan 17, 2017 by RedShift • edited Feb 8, 2017 by Spinner

2 Answers

0

I was able to work around this issue because I have setup the VPN server myself on my server. The StrongSwan VPN server allows to register updown script which is run whenever client connection changes.

conn %default
    ...
    leftupdown=/absolute/path/to/keepalive/script.sh
    ...

And the script.sh looks like this:

#!/bin/bash
/usr/lib/ipsec/_updown $* # call original updown script, just to be sure

# PLUTO_VERB - name of the event
# PLUTO_PEER_SOURCEIP - IP of the client

PLUTO_IPSTRING="${PLUTO_PEER_SOURCEIP//./}" # strip client IP of dots, to create unique name for pidfile
PID_PATH="/absolute/path/to/writable/location"

if [ "up-client" == "$PLUTO_VERB" ]; then # when new client connects
        ping $PLUTO_PEER_SOURCEIP -i 30 >/dev/null 2>/dev/null & echo $! > "$PID_PATH/pluto$PLUTO_IPSTRING.pid" # launch ping in background, send packets every 30 seconds. Store the PID in given file.
fi

if [ "down-client" == "$PLUTO_VERB" ]; then # when client disconnects
        kill -9 $(cat "$PID_PATH/pluto$PLUTO_IPSTRING.pid") # kill the ping process
        rm -rf "$PID_PATH/pluto$PLUTO_IPSTRING.pid" # remove the associated PID file
fi

You have to edit the paths to reflect you setup.

This approach is a bit of a hack, but works perfectly. Sadly it's only possible if you can create such a script. If you don't have access to the remote VPN server, this won't help you.

answered on Super User Jan 18, 2017 by jnovacho
0

This looks like it's the peer that's the deleting the session..;

*Jan 16 20:12:10.655: IKEv2:(SESSION ID = 12,SA ID = 1):Received Packet [From 192.168.0.107:4500/To 192.168.0.200:4500/VRF i0:f0] Initiator SPI : 598D998BFD1C9FFB - Responder SPI : 6ADF174B54D76AA1 Message id: 6 IKEv2 INFORMATIONAL Exchange REQUEST Payload contents: DELETE

so you need to t/shoot the windows client (sorry if you are already there..)

also you're fan looks like it's playing up..

*Jan 16 20:11:24.243: %ENVMON-3-FAN_FAILED: Fan is malfunctioning

answered on Super User Jan 27, 2017 by user690701

User contributions licensed under CC BY-SA 3.0