Why doesn't pppd over ssh work here? Why can't I kill pppd?

0

I'm trying to setup a simple ppp tunnel over ssh. It works on several machines just fine. But on one machine, pppd gets "stuck":

> pgrep pppd | xargs ps up
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      4178  0.0  0.1   3020  1088 pts/1    Ds+  05:28   0:00 /usr/sbin/pppd

Any attempt to kill it (even sudo kill -9 4178) has no effect that I can see. strace -p 4178 also hangs similarly. After it has been started for a while, I start getting messages in dmesg like shown below.

It is started like so from another machine:

ssh -t root@server /usr/sbin/pppd passive noauth

When I do this to one of the machines that work, the remote end's pppd spits out garbage/binary data to the console (as expected). When I do it to the one that fails, I get no output from pppd, but the ssh session eventually times out. If I instead ssh to the machine, and then run /usr/sbin/pppd passive noauth in a separate step I also get the expected binary output.

I now have a couple of questions:

  • What could be up with the one machine where pppd fails? I don't even know where to start looking...
  • What could be the difference between ssh -t root@server /usr/sbin/pppd passive noauth in a single step and ssh root@server and /usr/sbin/pppd passive noauth in two steps?
  • How can it be that I can't kill the process even with sudo kill -9? The only way I know is to reboot.

(I've tried searching for something similar but didn't get anywhere so I'm sorry I don't have any more leads)

Any ideas?

The problem machine runs in debian on VMware "hardware" (as do the ones that work) and it exhibits the problem when cloned and on both debian lenny (original) and squeeze (after upgrade)

dmesg entries:

[ 1198.727248] INFO: task pppd:4178 blocked for more than 120 seconds.
[ 1198.727507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1198.727904] pppd            D ece2dc9c     0  4178   4174 0x00000004
[ 1198.727908]  00000098 00000082 f2503520 ece2dc9c 0000b1e7 00000000 c148d1c0 c148d1c0
[ 1198.727913]  f2a06100 f6e071c0 00000000 ece2dc18 f5cd07e0 00000000 ece2d400 ece2dc9d
[ 1198.727918]  00c52300 ece2dcbc f67bfef8 ec98e480 f291cec0 00000000 c10cf5b0 c10dfd21
[ 1198.727923] Call Trace:
[ 1198.727926]  [<c10cf5b0>] ? nameidata_to_filp+0x37/0x41
[ 1198.727929]  [<c10dfd21>] ? dput+0x21/0xb7
[ 1198.727932]  [<c11cfecc>] ? tty_ldisc_ref_wait+0x5f/0x76
[ 1198.727935]  [<c104de7a>] ? wake_up_bit+0x5c/0x5c
[ 1198.727938]  [<c11cb91b>] ? tty_ioctl+0x85f/0x8ba
[ 1198.727941]  [<c10fec18>] ? do_lock_file_wait+0x3d/0xd9
[ 1198.727944]  [<c1162c97>] ? _copy_from_user+0x2b/0x102
[ 1198.727946]  [<c11cb0bc>] ? tty_check_change+0xb9/0xb9
[ 1198.727949]  [<c10dbeb7>] ? do_vfs_ioctl+0x485/0x4c7
[ 1198.727952]  [<c10db59a>] ? do_fcntl+0x24f/0x3a2
[ 1198.727954]  [<c10dbf3a>] ? sys_ioctl+0x41/0x58
[ 1198.727957]  [<c12c6a1f>] ? sysenter_do_call+0x12/0x28
[ 1318.457225] INFO: task sshd:4174 blocked for more than 120 seconds.
[ 1318.457500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1318.457896] sshd            D f25024cc     0  4174   2393 0x00000000
[ 1318.457901]  00000098 00000086 f2a06940 f25024cc 0000b246 00000000 c148d1c0 c148d1c0
[ 1318.457906]  f2503520 f6e071c0 00000000 3f056585 0000000f ece2d4bc 3f056585 f2503520
[ 1318.457911]  ec98bb38 ec98bbdc 00000000 00000000 00000000 c12c09b5 f2503520 c10327cb
[ 1318.457916] Call Trace:
[ 1318.457926]  [<c12c09b5>] ? schedule_hrtimeout_range_clock+0x3c/0xd9
[ 1318.457931]  [<c10327cb>] ? try_to_wake_up+0x13f/0x13f
[ 1318.457935]  [<c11cfecc>] ? tty_ldisc_ref_wait+0x5f/0x76
[ 1318.457940]  [<c104de7a>] ? wake_up_bit+0x5c/0x5c
[ 1318.457943]  [<c11c9ad3>] ? tty_poll+0x32/0x5e
[ 1318.457947]  [<c10dd4d5>] ? do_select+0x2a1/0x42e
[ 1318.457950]  [<c10dcb83>] ? poll_freewait+0x69/0x69
[ 1318.457953]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457955]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457958]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457960]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457963]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457965]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457968]  [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457971]  [<c10429c2>] ? lock_timer_base+0x19/0x35
[ 1318.457974]  [<c1042eb5>] ? __mod_timer+0x10c/0x116
[ 1318.457977]  [<c1042f89>] ? mod_timer+0x69/0x6e
[ 1318.457981]  [<c121325d>] ? sk_reset_timer+0xc/0x16
[ 1318.457984]  [<c1252f57>] ? tcp_event_new_data_sent+0x66/0x6b
[ 1318.457987]  [<c1255b85>] ? tcp_write_xmit+0x7a7/0x86a
[ 1318.457990]  [<c121760d>] ? __alloc_skb+0x50/0xfd
[ 1318.457994]  [<c12c12bc>] ? _raw_spin_lock_bh+0x8/0x1e
[ 1318.457996]  [<c1212e98>] ? release_sock+0x10/0xc4
[ 1318.457999]  [<c124b543>] ? tcp_sendmsg+0x6dd/0x7b7
[ 1318.458003]  [<c1162c97>] ? _copy_from_user+0x2b/0x102
[ 1318.458006]  [<c10dd7a0>] ? core_sys_select+0x13e/0x1c3
[ 1318.458009]  [<c12102a3>] ? sock_aio_write+0xc0/0xd4
[ 1318.458012]  [<c10d0655>] ? do_sync_write+0xa0/0xe4
[ 1318.458016]  [<c10b141c>] ? handle_mm_fault+0x222/0x238
[ 1318.458019]  [<c10f6096>] ? fsnotify+0x1de/0x1f9
[ 1318.458022]  [<c10dd9e8>] ? sys_select+0x6e/0x8f
[ 1318.458024]  [<c10d105e>] ? sys_write+0x3c/0x63
[ 1318.458028]  [<c12c6a1f>] ? sysenter_do_call+0x12/0x28
linux
debian
point-to-point-protocol
asked on Server Fault Nov 15, 2011 by Peter V. Mørch

1 Answer

0

I since remembered that that machine is running a new 2.6.39 based custom kernel I compiled myself. When I reverted to the stock Ubuntu kernel, it worked fine. As far as I remember, I didn't apply any patches to the kernel, I just took a debian unstable .config and took it from there.

If anybody else runs into this, I've created a gist with the two kernel configs [working, broken] for future reference. But for now, I'll just stick with the Ubuntu stock kernel for pppd.

However, regardless of what config options that are active in the 2.6.39 kernel, a pppd process that can't be killed is still odd...

answered on Server Fault Nov 15, 2011 by Peter V. Mørch

User contributions licensed under CC BY-SA 3.0