Kernel errors with 3.13.0-71-generic

0

After upgrading several hosts to Ubuntu 12.04.5 LTS with the LTS enablement stack (linux-generic-lts-trusty 3.13.0.40.35), we are seeing a sudden spike in kernel errors. These only begin to occur after a couple days of use, and don't seem (to my untrained eyes) to have much in common.

Was there a known issue in 3.13.0-71-generic? Is there anything we can do to fix this (or at least figure out what's happening)? These errors have occurred in the field but we have not yet been able to reproduce them in-house on identical hardware so we haven't had the opportunity to see if upgrading to the latest Trusty kernel fixes things.

The call traces are below:

Apr  4 23:35:37 hostname kernel: [319114.311718] INFO: task python2.7:5769 blocked for more than 300 seconds.
Apr  4 23:35:37 hostname kernel: [319114.311959]       Tainted: P           OX 3.13.0-71-generic #114~precise1-Ubuntu
Apr  4 23:35:37 hostname kernel: [319114.312201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  4 23:35:37 hostname kernel: [319114.312454] python2.7       D ffffffff81811520     0  5769   5767 0x00000000
Apr  4 23:35:37 hostname kernel: [319114.312457]  ffff8800023c3be8 0000000000000082 ffff8800023c3ba8 ffff8800023c3fd8
Apr  4 23:35:37 hostname kernel: [319114.312459]  0000000000013180 0000000000013180 ffffffff81c144a0 ffff88000238b000
Apr  4 23:35:37 hostname kernel: [319114.312460]  ffff8800023c3bc8 ffff8805ae5374a8 ffff8805ae5374ac 00000000ffffffff
Apr  4 23:35:37 hostname kernel: [319114.312462] Call Trace:
Apr  4 23:35:37 hostname kernel: [319114.312467]  [<ffffffff81764799>] schedule+0x29/0x70
Apr  4 23:35:38 hostname kernel: [319114.312469]  [<ffffffff81764abe>] schedule_preempt_disabled+0xe/0x10
Apr  4 23:35:38 hostname kernel: [319114.312470]  [<ffffffff817668f4>] __mutex_lock_slowpath+0x114/0x1b0
Apr  4 23:35:38 hostname kernel: [319114.312472]  [<ffffffff817669b3>] mutex_lock+0x23/0x37
Apr  4 23:35:38 hostname kernel: [319114.312474]  [<ffffffff811da631>] do_last+0x281/0x7d0
Apr  4 23:35:38 hostname kernel: [319114.312475]  [<ffffffff811dac44>] path_openat+0xc4/0x4c0
Apr  4 23:35:38 hostname kernel: [319114.312477]  [<ffffffff811855eb>] ? __handle_mm_fault+0x1db/0x360
Apr  4 23:35:38 hostname kernel: [319114.312478]  [<ffffffff81185823>] ? handle_mm_fault+0xb3/0x160
Apr  4 23:35:38 hostname kernel: [319114.312480]  [<ffffffff811dbed3>] do_filp_open+0x43/0xa0
Apr  4 23:35:38 hostname kernel: [319114.312483]  [<ffffffff811e900e>] ? __alloc_fd+0xce/0x120
Apr  4 23:35:38 hostname kernel: [319114.312486]  [<ffffffff811ca786>] do_sys_open+0x136/0x2a0
Apr  4 23:35:38 hostname kernel: [319114.312488]  [<ffffffff811ca90e>] SyS_open+0x1e/0x20
Apr  4 23:35:38 hostname kernel: [319114.312491]  [<ffffffff8177145d>] system_call_fastpath+0x1a/0x1f
Apr  4 23:35:38 hostname kernel: [319114.312496] INFO: task python2.7:6320 blocked for more than 300 seconds.
Apr  4 23:35:38 hostname kernel: [319114.312758]       Tainted: P           OX 3.13.0-71-generic #114~precise1-Ubuntu
Apr  4 23:35:38 hostname kernel: [319114.313031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  4 23:35:38 hostname kernel: [319114.313319] python2.7       D ffffffff81811520     0  6320   6314 0x00000000
Apr  4 23:35:38 hostname kernel: [319114.313320]  ffff880021ebdbe8 0000000000000086 0000000000000286 ffff880021ebdfd8
Apr  4 23:35:40 hostname kernel: [319114.313322]  0000000000013180 0000000000013180 ffffffff81c144a0 ffff880002393000
Apr  4 23:35:40 hostname kernel: [319114.313323]  ffff880021ebdbc8 ffff8805ae5374a8 ffff8805ae5374ac 00000000ffffffff

Apr 4 15:00:41 hostname kernel: [191113.073832] INFO: task python2.7:8525 blocked for more than 300 seconds.
Apr 4 15:01:00 hostname kernel: [191113.073859] Tainted: P OX 3.13.0-71-generic #114~precise1-Ubuntu
Apr 4 15:01:15 hostname kernel: [191113.073882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 4 15:01:15 hostname kernel: [191113.073906] python2.7 D 0000000000000000 0 8525 8517 0x00000000
Apr 4 15:01:15 hostname kernel: [191113.073909] ffff880212b3dbe8 0000000000000082 ffff880212b3dba8 ffff880212b3dfd8
Apr 4 15:01:15 hostname kernel: [191113.073911] 0000000000013180 0000000000013180 ffff88000e04e000 ffff8803251de000
Apr 4 15:01:15 hostname kernel: [191113.073913] ffff880212b3dbd8 ffff8802888190a8 ffff8802888190ac 00000000ffffffff
Apr 4 15:01:15 hostname kernel: [191113.073915] Call Trace:
Apr 4 15:01:15 hostname kernel: [191113.073921] [<ffffffff81764799>] schedule+0x29/0x70
Apr 4 15:01:15 hostname kernel: [191113.073923] [<ffffffff81764abe>] schedule_preempt_disabled+0xe/0x10
Apr 4 15:01:15 hostname kernel: [191113.073926] [<ffffffff817668f4>] __mutex_lock_slowpath+0x114/0x1b0
Apr 4 15:01:15 hostname kernel: [191113.073927] [<ffffffff817669b3>] mutex_lock+0x23/0x37
Apr 4 15:01:15 hostname kernel: [191113.073930] [<ffffffff811da631>] do_last+0x281/0x7d0
Apr 4 15:01:15 hostname kernel: [191113.073931] [<ffffffff811dac44>] path_openat+0xc4/0x4c0
Apr 4 15:01:15 hostname kernel: [191113.073934] [<ffffffff811855eb>] ? __handle_mm_fault+0x1db/0x360
Apr 4 15:01:15 hostname kernel: [191113.073935] [<ffffffff81185823>] ? handle_mm_fault+0xb3/0x160

Apr  6 19:56:45 hostname kernel: [450264.877269] Out of memory: Kill process 26196 (python2.7) score 14 or sacrifice child
Apr  6 19:56:45 hostname kernel: [450264.877307] Killed process 26196 (python2.7) total-vm:76966004kB, anon-rss:88036kB, file-rss:170036kB
Apr  6 20:12:01 hostname kernel: [451123.424257] INFO: task cron:32543 blocked for more than 300 seconds.
Apr  6 20:12:01 hostname kernel: [451123.424286]       Tainted: P           OX 3.13.0-71-generic #114~precise1-Ubuntu
Apr  6 20:12:01 hostname kernel: [451123.424312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  6 20:12:01 hostname kernel: [451123.424339] cron            D ffffffff81811520     0 32543   1398 0x00000000
Apr  6 20:12:01 hostname kernel: [451123.424343]  ffff880050453be8 0000000000000086 ffff880050453bd8 ffff880050453fd8
Apr  6 20:12:01 hostname kernel: [451123.424346]  0000000000013180 0000000000013180 ffff880873f20000 ffff88086e8d6000
Apr  6 20:12:01 hostname kernel: [451123.424348]  0000000000000286 ffff88086dedbb00 ffff88086dedbb04 00000000ffffffff
Apr  6 20:12:01 hostname kernel: [451123.424350] Call Trace:
Apr  6 20:12:01 hostname kernel: [451123.424356]  [<ffffffff81764799>] schedule+0x29/0x70
Apr  6 20:12:01 hostname kernel: [451123.424359]  [<ffffffff81764abe>] schedule_preempt_disabled+0xe/0x10
Apr  6 20:12:01 hostname kernel: [451123.424362]  [<ffffffff817668f4>] __mutex_lock_slowpath+0x114/0x1b0
Apr  6 20:12:01 hostname kernel: [451123.424364]  [<ffffffff817669b3>] mutex_lock+0x23/0x37
Apr  6 20:12:01 hostname kernel: [451123.424366]  [<ffffffff811da631>] do_last+0x281/0x7d0
Apr  6 20:12:01 hostname kernel: [451123.424368]  [<ffffffff811dac44>] path_openat+0xc4/0x4c0
Apr  6 20:12:01 hostname kernel: [451123.424371]  [<ffffffff811855eb>] ? __handle_mm_fault+0x1db/0x360
Apr  6 20:12:01 hostname kernel: [451123.424373]  [<ffffffff81185823>] ? handle_mm_fault+0xb3/0x160
Apr  6 20:12:01 hostname kernel: [451123.424375]  [<ffffffff811dbed3>] do_filp_open+0x43/0xa0
Apr  6 20:12:01 hostname kernel: [451123.424378]  [<ffffffff811e900e>] ? __alloc_fd+0xce/0x120
Apr  6 20:12:31 hostname kernel: [451123.424381]  [<ffffffff811ca786>] do_sys_open+0x136/0x2a0
Apr  6 20:12:31 hostname kernel: [451123.424383]  [<ffffffff811ca90e>] SyS_open+0x1e/0x20
Apr  6 20:12:31 hostname kernel: [451123.424387]  [<ffffffff8177145d>] system_call_fastpath+0x1a/0x1f

This one may be bad memory:

Apr 5 19:58:53 hostname kernel: [462034.034881] apache2 invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Apr 5 19:58:53 hostname kernel: [462034.034885] apache2 cpuset=/ mems_allowed=0
Apr 5 19:58:53 hostname kernel: [462034.034888] CPU: 6 PID: 19720 Comm: apache2 Tainted: P OX 3.13.0-71-generic #114~precise1-Ubuntu
Apr 5 19:58:53 hostname kernel: [462034.034889] Hardware name: Supermicro C7Z87-OCE/C7Z87-OCE, BIOS 2.2 01/30/2015
Apr 5 19:58:53 hostname kernel: [462034.034890] 0000000000000000 ffff88089e46b888 ffffffff8175bca1 0000000000000007
Apr 5 19:58:53 hostname kernel: [462034.034893] ffff880203b91800 ffff88089e46b8d8 ffffffff8175172b ffff880800000000
Apr 5 19:58:53 hostname kernel: [462034.034895] 000201da81381898 ffff88001e730000 ffff880003f28000 0000000000000000
Apr 5 19:58:53 hostname kernel: [462034.034897] Call Trace:
Apr 5 19:58:53 hostname kernel: [462034.034902] [<ffffffff8175bca1>] dump_stack+0x46/0x58
Apr 5 19:58:53 hostname kernel: [462034.034905] [<ffffffff8175172b>] dump_header+0x7e/0xbd
Apr 5 19:58:53 hostname kernel: [462034.034907] [<ffffffff817517c1>] oom_kill_process.part.5+0x57/0x2d7
Apr 5 19:58:53 hostname kernel: [462034.034910] [<ffffffff8115cb27>] oom_kill_process+0x47/0x50
Apr 5 19:58:53 hostname kernel: [462034.034912] [<ffffffff8115ce65>] out_of_memory+0x145/0x1d0
Apr 5 19:58:53 hostname kernel: [462034.034915] [<ffffffff81162e17>] __alloc_pages_nodemask+0xab7/0xbb0
Apr 5 19:58:53 hostname kernel: [462034.034919] [<ffffffff811a4102>] alloc_pages_current+0xb2/0x170
Apr 5 19:58:53 hostname kernel: [462034.034921] [<ffffffff811591c7>] __page_cache_alloc+0xb7/0xd0
Apr 5 19:58:53 hostname kernel: [462034.034923] [<ffffffff8115afbd>] filemap_fault+0x28d/0x440
Apr 5 19:58:53 hostname kernel: [462034.034926] [<ffffffff811811ef>] __do_fault+0x6f/0x530
Apr 5 19:58:53 hostname kernel: [462034.034928] [<ffffffff81185046>] handle_pte_fault+0x96/0x230
Apr 5 19:58:53 hostname kernel: [462034.034930] [<ffffffff81764799>] ? schedule+0x29/0x70
Apr 5 19:58:53 hostname kernel: [462034.034932] [<ffffffff811855eb>] __handle_mm_fault+0x1db/0x360
Apr 5 19:58:53 hostname kernel: [462034.034934] [<ffffffff81185823>] handle_mm_fault+0xb3/0x160
Apr 5 19:58:53 hostname kernel: [462034.034937] [<ffffffff8176c720>] __do_page_fault+0x1b0/0x580
Apr 5 19:58:53 hostname kernel: [462034.034940] [<ffffffff8101ce89>] ? read_tsc+0x9/0x20
Apr 5 19:58:53 hostname kernel: [462034.034943] [<ffffffff810d329c>] ? ktime_get_ts+0x4c/0xe0
Apr 5 19:58:53 hostname kernel: [462034.034946] [<ffffffff811deb4d>] ? poll_select_copy_remaining+0xed/0x140
Apr 5 19:58:53 hostname kernel: [462034.034948] [<ffffffff8176cb0a>] do_page_fault+0x1a/0x70
Apr 5 19:58:53 hostname kernel: [462034.034950] [<ffffffff81768b28>] page_fault+0x28/0x30
linux
ubuntu
kernel
asked on Server Fault Apr 11, 2016 by geordan

1 Answer

0

Since I have to answer this question to close it: per Michael Hampton's comment, updating the kernel (to .85) resolved the issue.

answered on Server Fault May 13, 2016 by geordan

User contributions licensed under CC BY-SA 3.0