We are experiencing a very strange behaviour
Platform is Red Hat Enterprise Linux Server release 6.7 (Santiago)
.
Kernel is
Linux *** 2.6.32-573.7.1.el6.x86_64 #1 SMP Thu Sep 10 13:42:16 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Calling ssh
to execute a direct command into a server works ver well:
# ssh server-name 'while true; do find /; done'
<output from find>
#
When executing the same loop from the console, it works very well.
However, when executing the same loop, from an interactive SSH session, the connection hangs for 5 (or more) minutes:
#ssh server-name
<banner>
# while true; do find /; done
<output from find>
...hangs...have to wait until is back
<output from find continues>
After awhile, it simply hangs and no more output from find until it unhangs.
When running two sessions with the same loop, one session with interactive ssh, and the second session running the loop from the command line, the interactive sessions hangs, and the non-interactive sessions continues to work.
When opening a SSH session, sometimes it hangs after banner, sometimes did not. At any point in time, on an interactive session, the hang happens and we have to wait until it is back.
So, the problem is limited to interactive SSH sessions.
To be sure is not a name resolution problem, I have the sshd_config
with UseDNS no
(yes, service restarted after changing the configuration file).
top
, on a console session, shows the system is about 80% idle.
I already checked /var/log/messages
, /var/log/secure
, dmesg
, iLO logs and nothing of relevance is show.
Any clues what could be causing this problem?
[EDIT 4] From https://unix.stackexchange.com/questions/59325/program-stall-under-user-but-runs-under-root I decided to check the user ulimit:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 516021
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 8192
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[EDIT 3] A simple network test with ping was left running all the time to check for eventual network disconnected. Not a single packet lost.
[EDIT 2] No errors on the console. Nothing.
[EDIT 1] Digging a lit bit more, I found those errors messages from dmesg
:
INFO: task svn:48353 blocked for more than 120 seconds.
Not tainted 2.6.32-573.7.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
svn D 0000000000000009 0 48353 37647 0x00000080
ffff88012d5f7d28 0000000000000082 ffff881029094ae8 ffff8810789f5a28
ffff88012d5f7cb8 ffffffff8106f712 ffff88012d5f7c98 ffff881029094ae8
ffff8810789f5a28 ffff8810789f59c0 ffff88101decbad8 ffff88012d5f7fd8
Call Trace:
[<ffffffff8106f712>] ? enqueue_entity+0x112/0x440
[<ffffffff8105e646>] ? enqueue_task+0x66/0x80
[<ffffffff81539445>] schedule_timeout+0x215/0x2e0
[<ffffffff815390c3>] wait_for_common+0x123/0x180
[<ffffffff810672b0>] ? default_wake_function+0x0/0x20
....
User contributions licensed under CC BY-SA 3.0