My Apache hangs frequently with multiple threads. Each process get stucked for hours. Backtrace looks like this:
(gdb) backtrace
#0 0x00002af60c22b2d7 in semop () from /lib64/libc.so.6
#1 0x00002af60bbf612c in ?? () from /usr/lib64/libapr-1.so.0
#2 0x000055555559e614 in ?? () from /usr/sbin/httpd2-prefork
#3 0x000055555559e9ea in ?? () from /usr/sbin/httpd2-prefork
#4 0x000055555559f25d in ap_mpm_run () from /usr/sbin/httpd2-prefork
#5 0x000055555557a080 in main () from /usr/sbin/httpd2-prefork
With strace
I see they are waiting for a pipe that is connection all Apache processes.
strace -p 3069
....
read(7, 0x7fff16a04df7, 1) = -1 EAGAIN (Resource temporarily unavailable)
semop(286162952, 0x2af60bd07dc0, 1 <unfinished ...>
What is Apache doing here?
How can I figure out what is causing this?
Update
Data as requested in comments
# ipcs -a
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x06347849 32768 root 666 65544 2
0x0c6629c9 21004289 root 640 1166952 2
0x3107040d 98306 root 666 131176 3
0x00000000 436994051 root 600 33554432 11 dest
0x01070756 191135748 root 664 4192 1
0x01070730 190349317 root 664 4192 1
0x01070736 190382086 root 664 4192 1
0x01070742 190414855 root 664 4192 1
0x01070746 190447624 root 664 4192 1
0x01070753 190545929 root 664 4192 1
0x0107075e 190611466 root 664 4192 1
0x01070750 191037451 root 664 4192 1
0x010706c8 21069838 root 664 4192 1
0x0107074d 191070223 root 664 4192 1
------ Semaphore Arrays --------
key semid owner perms nsems
0x0107000d 0 root 666 1
0x0107000e 32769 root 666 1
0x3107040d 98306 root 666 5
0x72070097 243433475 root 666 2
0x00000000 977469444 wwwrun 600 1
0x4d028007 262149 root 600 8
0x00000000 450166790 wwwrun 600 1
0x0107073f 1209401351 root 664 1
0x00000000 977502216 wwwrun 600 1
0x00000000 1208451083 root 600 1
0x01070751 1208582156 root 664 1
0x01070758 1208647693 root 664 1
0x00000000 1208680462 root 600 1
0x01070749 1209237519 root 664 1
0x0107074e 1209270289 root 664 1
0x00000000 1209303058 root 600 1
0x00000000 1209335827 root 600 1
0x00000000 1209434132 root 600 1
------ Message Queues --------
key msqid owner perms used-bytes messages
and
# ps auxwww | grep "apache"
wwwrun 2708 0.0 0.5 201576 11972 ? S Nov11 0:05 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL
wwwrun 3607 0.0 0.6 202472 13388 ? S Nov11 0:06 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL
root 5798 0.0 0.7 200828 14800 ? Ss Nov08 0:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL
wwwrun 12926 0.0 0.5 201712 11768 ? S 08:19 0:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL
wwwrun 13009 0.0 0.6 202196 13340 ? S 02:19 0:05 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL
There are a few more processes but you get the image.
Also it is a Suse Server:
# cat /proc/version
Linux version 2.6.16.60-0.74.7-default (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 Fri Nov 26 09:16:10 UTC 2010
httpd.conf
# grep ^[^#] /etc/apache2/httpd.conf
Include /etc/apache2/uid.conf
Include /etc/apache2/server-tuning.conf
ErrorLog /var/log/apache2/error_log
Include /etc/apache2/sysconfig.d/loadmodule.conf
Include /etc/apache2/listen.conf
Include /etc/apache2/mod_log_config.conf
Include /etc/apache2/sysconfig.d/global.conf
Include /etc/apache2/mod_status.conf
Include /etc/apache2/mod_info.conf
Include /etc/apache2/mod_usertrack.conf
Include /etc/apache2/mod_autoindex-defaults.conf
TypesConfig /etc/apache2/mime.types
DefaultType text/plain
Include /etc/apache2/mod_mime-defaults.conf
Include /etc/apache2/errors.conf
Include /etc/apache2/ssl-global.conf
<Directory />
Options None
AllowOverride None
Order deny,allow
Deny from all
</Directory>
AccessFileName .htaccess
<Files ~ "^\.ht">
Order allow,deny
Deny from all
</Files>
DirectoryIndex index.html index.html.var
Include /etc/apache2/default-server.conf
Include /etc/apache2/sysconfig.d/include.conf
Include /etc/apache2/vhosts.d/*.conf
read(7 ,..)
points to a pipe:
# ls -la /proc/3069/fd/7
lr-x------ 1 root root 64 Nov 7 17:24 7 -> pipe:[157329520]
It connects all apache processes:
# lsof | grep 157329520
httpd2-pr 2430 root 7r FIFO 0,5 157329520 pipe
httpd2-pr 2430 root 8w FIFO 0,5 157329520 pipe
httpd2-pr 3061 wwwrun 7r FIFO 0,5 157329520 pipe
httpd2-pr 3061 wwwrun 8w FIFO 0,5 157329520 pipe
...
About the semaphore
# ipcs -s -i 39452680
Semaphore Array semid=39452680
uid=30 gid=8 cuid=0 cgid=0
mode=0600, access_perms=0600
nsems = 1
otime = Mon Nov 19 09:47:05 2012
ctime = Sun Nov 18 11:15:04 2012
semnum value ncount zcount pid
0 0 5 0 14678
The ncount
always matches the number of idle workers from apache2ctl status
so I belive the whole semop is just normal idel worker and has nothing to do with my problem...
I believe you're tripping over a sparsely-known issue. It seems to be a bug in Linux, where the semephore count is already 0, but processes wait as if it's not. I do not understand the mechanics of this bug, but it apparently happens only on loaded machines.
Run ipcs -s -i $SEM_ID
where $SEM_ID is the first argument given to semop(). It should show the count to be 0, which would confirm the problem is in Linux, and not Apache. If the value is anything but 0, the problem would be in Apache's code.
It appears you haven't updated the kernel in about 2 years, there may have been a fix since then. Others have reported that the epoll path limit of 1000 prevents Apache from using more than 1000 "max clients" setting.
If anyone else stumbles upon this thread.
We encountered an issue in productie with OCSP stapling which saw all childprocesses hanging in semop after TCP connection was established, but before TLS handshake finish. Apparently the main server was waiting for an OCSP staple from a non responding OCSP server. Also clients may keep hanging in the TLS handshake waiting for their own verification.
User contributions licensed under CC BY-SA 3.0