I was having database issues whereby commits were not completing (PostgreSQL) and it seems to be related to these messages that are filling up my server log:
Jan 30 02:29:45 server001 kernel: [3521062.240540] postgres D 0000000000000000 0 23220 8154 0x00000004
Jan 30 02:29:45 server001 kernel: [3521062.240550] ffff8800174c9d08 0000000000000082 ffff88041cd24728 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240559] ffff8806c678b110 0000000000015880 0000000000015880 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240567] 0000000000015880 ffff8806c678b110 0000000000015880 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240575] Call Trace:
Jan 30 02:29:45 server001 kernel: [3521062.240582] [<ffffffff810da010>] ? sync_page+0x0/0x50
Jan 30 02:29:45 server001 kernel: [3521062.240590] [<ffffffff81528488>] io_schedule+0x28/0x40
Jan 30 02:29:45 server001 kernel: [3521062.240596] [<ffffffff810da04d>] sync_page+0x3d/0x50
Jan 30 02:29:45 server001 kernel: [3521062.240603] [<ffffffff815289a7>] __wait_on_bit+0x57/0x80
Jan 30 02:29:45 server001 kernel: [3521062.240610] [<ffffffff810da1be>] wait_on_page_bit+0x6e/0x80
Jan 30 02:29:45 server001 kernel: [3521062.240618] [<ffffffff81078540>] ? wake_bit_function+0x0/0x40
Jan 30 02:29:45 server001 kernel: [3521062.240627] [<ffffffff810e4480>] ? pagevec_lookup_tag+0x20/0x30
Jan 30 02:29:45 server001 kernel: [3521062.240634] [<ffffffff810da665>] wait_on_page_writeback_range+0xf5/0x190
Jan 30 02:29:45 server001 kernel: [3521062.240644] [<ffffffff81053668>] ? try_to_wake_up+0x118/0x340
Jan 30 02:29:45 server001 kernel: [3521062.240651] [<ffffffff810da727>] filemap_fdatawait+0x27/0x30
Jan 30 02:29:45 server001 kernel: [3521062.240659] [<ffffffff811431b4>] vfs_fsync+0xa4/0xf0
Jan 30 02:29:45 server001 kernel: [3521062.240667] [<ffffffff81143239>] do_fsync+0x39/0x60
Jan 30 02:29:45 server001 kernel: [3521062.240674] [<ffffffff8114328b>] sys_fsync+0xb/0x10
Jan 30 02:29:45 server001 kernel: [3521062.240682] [<ffffffff81012042>] system_call_fastpath+0x16/0x1b
EDIT:
I found this further info in syslog:
Jan 30 02:21:45 server001 kernel: [3520582.242828] INFO: task postgres:8750 blocked for more than 120 seconds.
Jan 30 02:21:45 server001 kernel: [3520582.242873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 30 02:21:45 server001 kernel: [3520582.242933] postgres D 00000000ffffffff 0 8750 8154 0x00000004
Jan 30 02:21:45 server001 kernel: [3520582.242946] ffff880415cd7d08 0000000000000082 ffff880415cd7c88 0000000000015880
Jan 30 02:21:45 server001 kernel: [3520582.242957] ffff88040dc79a60 0000000000015880 0000000000015880 0000000000015880
Jan 30 02:21:45 server001 kernel: [3520582.242965] 0000000000015880 ffff88040dc79a60 0000000000015880 0000000000015880
Jan 30 02:21:45 server001 kernel: [3520582.242974] Call Trace:
Jan 30 02:21:45 server001 kernel: [3520582.242994] [<ffffffff810da010>] ? sync_page+0x0/0x50
Jan 30 02:21:45 server001 kernel: [3520582.243006] [<ffffffff81528488>] io_schedule+0x28/0x40
Jan 30 02:21:45 server001 kernel: [3520582.243014] [<ffffffff810da04d>] sync_page+0x3d/0x50
Jan 30 02:21:45 server001 kernel: [3520582.243022] [<ffffffff815289a7>] __wait_on_bit+0x57/0x80
Jan 30 02:21:45 server001 kernel: [3520582.243029] [<ffffffff810da1be>] wait_on_page_bit+0x6e/0x80
Jan 30 02:21:45 server001 kernel: [3520582.243039] [<ffffffff81078540>] ? wake_bit_function+0x0/0x40
Jan 30 02:21:45 server001 kernel: [3520582.243049] [<ffffffff810e4480>] ? pagevec_lookup_tag+0x20/0x30
Jan 30 02:21:45 server001 kernel: [3520582.243057] [<ffffffff810da665>] wait_on_page_writeback_range+0xf5/0x190
Jan 30 02:21:45 server001 kernel: [3520582.243070] [<ffffffff811f4e9c>] ? jbd2_log_start_commit+0x3c/0x50
Jan 30 02:21:45 server001 kernel: [3520582.243078] [<ffffffff810da727>] filemap_fdatawait+0x27/0x30
Jan 30 02:21:45 server001 kernel: [3520582.243088] [<ffffffff811431b4>] vfs_fsync+0xa4/0xf0
Jan 30 02:21:45 server001 kernel: [3520582.243096] [<ffffffff81143239>] do_fsync+0x39/0x60
Jan 30 02:21:45 server001 kernel: [3520582.243103] [<ffffffff8114328b>] sys_fsync+0xb/0x10
Jan 30 02:21:45 server001 kernel: [3520582.243114] [<ffffffff81012042>] system_call_fastpath+0x16/0x1b
EDIT 2:
I realized that I don't have anything in the logs other than around 2AM. Yet, since then, my database server has connections that have been hanging up.
--
Running Ubuntu Linux 9.10, Kernel 2.6.31-20-server.
This happened in the past but at that time, the whole server was hung. This time, it is generally working.
I'm trying to find two things:
1) Is there something I can do to un-hang the current threads that are waiting on this. 2) What would cause this?
User contributions licensed under CC BY-SA 3.0