wired dmesg output and high cpu load

1

I have like 50 Linux servers with the same hardware and kernel, but recently found that some of them suffers from high CPU load, runs very slowly, top and ps show TIME column with impossible large numbers; ps aux output with many 99% CPU.

the kernel is Linux 3.0.13, it's tailored:
Linux 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64 x86_64 x86_64 GNU/Linux

ps with high cpu :

#ps u |grep 99\.9
root      1604 99.9  0.0 13760 2132 pts/1    Ss+  Oct29 38443218:17 -bash
root     13011 99.9  0.0  1532  588 tty1     Ss+  Oct28 20833538:06 /sbin/mingetty --noclear tty1
root     13014 99.9  0.0  1532  572 tty4     Ss+  Oct28 600517:28 /sbin/mingetty tty4
root     13016 99.9  0.0  1532  576 tty6     Ss+  Oct28 20833538:06 /sbin/mingetty tty6
root     14501 99.9  0.0 13760 2124 pts/2    Ss   18:30 1501293:42 -bash

and top with very large TIME:

#top 
top - 18:34:20 up 2 days,  7:39,  2 users,  load average: 7.08, 7.36, 7.96
Tasks: 158 total,  11 running, 111 sleeping,   2 stopped,  34 zombie
Cpu0 : 71.2% us, 17.2% sy,  0.7% ni, 10.9% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1 :  0.6% us,  1.3% sy,  0.0% ni, 98.1% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2 : 15.2% us,  8.9% sy,  0.0% ni, 65.2% id,  0.0% wa,  0.3% hi, 10.3% si
Cpu3 : 16.3% us, 10.0% sy,  0.0% ni, 64.8% id,  0.0% wa,  0.0% hi,  9.0% si
Cpu4 : 25.9% us, 12.3% sy,  0.0% ni, 56.1% id,  0.0% wa,  0.0% hi,  5.6% si
Cpu5 :  1.0% us,  3.6% sy,  0.0% ni, 86.9% id,  0.0% wa,  0.0% hi,  8.5% si
Cpu6 :  0.3% us,  0.0% sy,  0.0% ni, 99.7% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu7 : 10.6% us,  7.9% sy,  0.0% ni, 81.5% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:  16444164k total, 15097904k used,  1346260k free,   107516k buffers
Swap:        0k total,        0k used,        0k free,  5322840k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                 
337 root      20   0 11392  732  400 S 99.9  0.0 324224:12 keepalive.sh                                                            
14499 root      20   0  7564 1836 1476 S 99.9  0.0 369896:40 sshd                                                                    
13624 root      20   0  1348  212  172 S 99.9  0.0 486338:00 sleep.out                                                               
32713 root      20   0  1908  956  696 R  0.3  0.0  74529:31 top                                                                     
1   root      20   0   720  224  188 S  0.0  0.0 436089:26 init                                                                    
2   root      20   0     0    0    0 S  0.0  0.0 369896:40 kthreadd                                                                
3   root      20   0     0    0    0 S  0.0  0.0 385563:56 ksoftirqd/0                                                             
6   root      RT   0     0    0    0 S  0.0  0.0 303560:22 migration/0                                                             
7   root      RT   0     0    0    0 R  0.0  0.0 300258:44 watchdog/0                                                              
8   root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1                                                             
9   root      20   0     0    0    0 S  0.0  0.0 300258:44 kworker/1:0                                                             
10  root      20   0     0    0    0 S  0.0  0.0  23965:30 ksoftirqd/1                                                             
12  root      RT   0     0    0    0 R  0.0  0.0 600517:28 watchdog/1                                                              
13  root      RT   0     0    0    0 S  0.0  0.0 303408:44 migration/2                                                             
15  root      20   0     0    0    0 S  0.0  0.0 700896:24 ksoftirqd/2                                                             
16  root      RT   0     0    0    0 R  0.0  0.0 600517:28 watchdog/2                

defunct processes:

root      5381 32001 99 11:42 ?        208-12:18:44 [sh] <defunct>
root      5383 32001 99 11:42 ?        208-12:18:44 [sh] <defunct>
root      5385 32001  0 11:42 ?        00:00:00 [sh] <defunct>
root      5387 32001  0 11:42 ?        00:00:00 [sh] <defunct> 
root     32162 31998 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32164 32000 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32166 32000  0 11:42 ?        00:00:00 [sh] <defunct>
root     32168 31999 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32170 31999  0 11:42 ?        00:00:00 [sh] <defunct>
root     32172 31999 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32174 32004  0 11:42 ?        00:00:00 [sh] <defunct>
root     32175 31997 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32177 32004 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32179 31997 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32181 32004  0 11:42 ?        00:00:00 [sh] <defunct>
root     32183 32004 99 11:42 ?        208-12:18:44 [sh] <defunct>
root     32185 31997  0 11:42 ?        00:00:00 [sh] <defunct>

eight cpus alike:

#cat /proc/cpuinfo
processor       : 0 
vendor_id       : GenuineIntel
cpu family      : 6 
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
stepping        : 7 
cpu MHz         : 2400.236
cache size      : 10240 KB
physical id     : 0 
siblings        : 4 
core id         : 0 
cpu cores       : 4 
apicid          : 0 
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 4800.47
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

the abnormal dmesg(half past the benning,the first column):

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.0.13-0.27-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b)
[    0.000000] Command line: root=/dev/sda1 splash=0 crashkernel=256M-:128M@16M vga=0x31a
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 0000000000093400 (usable)
[    0.000000]  BIOS-e820: 0000000000093400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000007e69b000 (usable)
[    0.000000]  BIOS-e820: 000000007e69b000 - 000000007e7a9000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007e7a9000 - 000000007f3a9000 (reserved)
[    0.000000]  BIOS-e820: 000000007f3a9000 - 000000007f423000 (ACPI data)
[    0.000000]  BIOS-e820: 000000007f423000 - 000000007f4af000 (reserved)
[    0.000000]  BIOS-e820: 000000007f4af000 - 000000007f4b1000 (usable)
[    0.000000]  BIOS-e820: 000000007f4b1000 - 000000007f4b2000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007f4b2000 - 000000007f4bb000 (reserved)
[    0.000000]  BIOS-e820: 000000007f4bb000 - 000000007f4c2000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007f4c2000 - 000000007f4e4000 (reserved)
[    0.000000]  BIOS-e820: 000000007f4e4000 - 000000007f56a000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007f56a000 - 000000007f7e0000 (usable)
[    0.000000]  BIOS-e820: 000000007f7e0000 - 000000007f7e1000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007f7e1000 - 000000007f7e6000 (reserved)
[    0.000000]  BIOS-e820: 000000007f7e6000 - 000000007f800000 (usable)
[    0.000000]  BIOS-e820: 0000000080000000 - 0000000090000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed20000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000480000000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.7 present.
[    0.000000] DMI: empty empty/ S7057 , BIOS V1.01B 06/23/2014
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x480000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask 3FFC00000000 write-back
[    0.000000]   1 base 000400000000 mask 3FFF80000000 write-back
[    0.000000]   2 base 000080000000 mask 3FFF80000000 uncachable
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000]   8 disabled
[    0.000000]   9 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 0000000080000000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] last_pfn = 0x7f800 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000fd930] fd930
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Base memory trampoline at [ffff88000008e000] 8e000 size 20480
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-000000007f800000
[    0.000000]  0000000000 - 0040000000 page 1G
[    0.000000]  0040000000 - 007f800000 page 2M
[    0.000000] kernel direct mapping tables up to 7f800000 @ 1fffe000-20000000
[    0.000000] init_memory_mapping: 0000000100000000-0000000480000000
[    0.000000]  0100000000 - 0480000000 page 1G
[    0.000000] kernel direct mapping tables up to 480000000 @ 7f7ff000-7f800000
[    0.000000] RAMDISK: 37653000 - 37ff0000
[    0.000000] crashkernel reservation failed - memory is in use.
[    0.000000] ACPI: RSDP 00000000000f0490 00024 (v02 ALASKA)
[    0.000000] ACPI: XSDT 000000007f3a9080 00084 (v01 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: FACP 000000007f3b21a8 000F4 (v04 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: DSDT 000000007f3a9198 0900C (v02 ALASKA    A M I 00000001 INTL 20051117)
[    0.000000] ACPI: FACS 000000007f4c0f80 00040
[    0.000000] ACPI: APIC 000000007f3b22a0 000AA (v03 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: MCFG 000000007f3b2350 0003C (v01 ALASKA OEMMCFG. 01072009 MSFT 00000097)
[    0.000000] ACPI: SRAT 000000007f3b2390 00330 (v01 A M I  AMI SRAT 00000001 AMI. 00000000)
[    0.000000] ACPI: SLIT 000000007f3b26c0 00030 (v01 A M I  AMI SLIT 00000000 AMI. 00000000)
[    0.000000] ACPI: HPET 000000007f3b26f0 00038 (v01 ALASKA    A M I 01072009 AMI. 00000004)
[    0.000000] ACPI: SSDT 000000007f3b2728 70104 (v02  INTEL    CpuPm 00004000 INTL 20051117)
[    0.000000] ACPI: EINJ 000000007f422830 00130 (v01    AMI AMI EINJ 00000000      00000000)
[    0.000000] ACPI: ERST 000000007f422960 00230 (v01  AMIER AMI ERST 00000000      00000000)
[    0.000000] ACPI: HEST 000000007f422b90 000A8 (v01    AMI AMI HEST 00000000      00000000)
[    0.000000] ACPI: BERT 000000007f422c38 00030 (v01    AMI AMI BERT 00000000      00000000)
[    0.000000] ACPI: BGRT 000000007f422c68 00038 (v00 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x06 -> Node 0
[    0.000000] SRAT: PXM 1 -> APIC 0x20 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 0x22 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 0x24 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 0x26 -> Node 1
[    0.000000] SRAT: Node 0 PXM 0 0-80000000
[    0.000000] SRAT: Node 0 PXM 0 100000000-280000000
[    0.000000] SRAT: Node 1 PXM 1 280000000-480000000
[    0.000000] NUMA: Initialized distance table, cnt=2
[    0.000000] NUMA: Node 0 [0,80000000) + [100000000,280000000) -> [0,280000000)
[    0.000000] Initmem setup node 0 0000000000000000-0000000280000000
[    0.000000]   NODE_DATA [000000027ffd9000 - 000000027fffffff]
[    0.000000] Initmem setup node 1 0000000280000000-0000000480000000
[    0.000000]   NODE_DATA [000000047ffd8080 - 000000047ffff07f]
[    0.000000]  [ffffea0000000000-ffffea0008bfffff] PMD -> [ffff880277e00000-ffff88027edfffff] on node 0
[    0.000000]  [ffffea0008c00000-ffffea000fbfffff] PMD -> [ffff880477600000-ffff88047e5fffff] on node 1
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00480000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[7] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x00000093
[    0.000000]     0: 0x00000100 -> 0x0007e69b
[    0.000000]     0: 0x0007f4af -> 0x0007f4b1
[    0.000000]     0: 0x0007f56a -> 0x0007f7e0
[    0.000000]     0: 0x0007f7e6 -> 0x0007f800
[    0.000000]     0: 0x00100000 -> 0x00280000
[    0.000000]     1: 0x00280000 -> 0x00480000
[    0.000000] On node 0 totalpages: 2091184
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 5 pages reserved
[    0.000000]   DMA zone: 3910 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 500069 pages, LIFO batch:31
[    0.000000]   Normal zone: 21504 pages used for memmap
[    0.000000]   Normal zone: 1551360 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 2097152
[    0.000000]   Normal zone: 28672 pages used for memmap
[    0.000000]   Normal zone: 2068480 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x408
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x20] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x22] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x24] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x26] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47
[    0.000000] ACPI: IOAPIC (id[0x03] address[0xfec40000] gsi_base[48])
[    0.000000] IOAPIC[2]: apic_id 3, version 32, address 0xfec40000, GSI 48-71
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.000000] SMP: Allowing 8 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 88
[    0.000000] PM: Registered nosave memory: 0000000000093000 - 0000000000094000
[    0.000000] PM: Registered nosave memory: 0000000000094000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
[    0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 000000007e69b000 - 000000007e7a9000
[    0.000000] PM: Registered nosave memory: 000000007e7a9000 - 000000007f3a9000
[    0.000000] PM: Registered nosave memory: 000000007f3a9000 - 000000007f423000
[    0.000000] PM: Registered nosave memory: 000000007f423000 - 000000007f4af000
[    0.000000] PM: Registered nosave memory: 000000007f4b1000 - 000000007f4b2000
[    0.000000] PM: Registered nosave memory: 000000007f4b2000 - 000000007f4bb000
[    0.000000] PM: Registered nosave memory: 000000007f4bb000 - 000000007f4c2000
[    0.000000] PM: Registered nosave memory: 000000007f4c2000 - 000000007f4e4000
[    0.000000] PM: Registered nosave memory: 000000007f4e4000 - 000000007f56a000
[    0.000000] PM: Registered nosave memory: 000000007f7e0000 - 000000007f7e1000
[    0.000000] PM: Registered nosave memory: 000000007f7e1000 - 000000007f7e6000
[    0.000000] PM: Registered nosave memory: 000000007f800000 - 0000000080000000
[    0.000000] PM: Registered nosave memory: 0000000080000000 - 0000000090000000
[    0.000000] PM: Registered nosave memory: 0000000090000000 - 00000000fed1c000
[    0.000000] PM: Registered nosave memory: 00000000fed1c000 - 00000000fed20000
[    0.000000] PM: Registered nosave memory: 00000000fed20000 - 00000000ff000000
[    0.000000] PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
[    0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:2
[    0.000000] PERCPU: Embedded 26 pages/cpu @ffff88027fc00000 s74880 r8192 d23424 u524288
[    0.000000] pcpu-alloc: s74880 r8192 d23424 u524288 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 
[    0.000000] Built 2 zonelists in Zone order, mobility grouping on.  Total pages: 4123819
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: root=/dev/sda1 splash=0 crashkernel=256M-:128M@16M vga=0x31a
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 16430212k/18874368k available (4405k kernel code, 2121024k absent, 323132k reserved, 7781k data, 1356k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] NR_IRQS:262400 nr_irqs:1560 16
[    0.000000] Extended CMOS year: 2000
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] allocated 134217728 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] Fast TSC calibration using PIT
[    0.004000] Detected 2400.236 MHz processor.
[18014398.509486] Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.47 BogoMIPS (lpj=9600944)
[18014398.509491] pid_max: default: 32768 minimum: 301
[18014398.696649] kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
[18014398.696878] Security Framework initialized
[18014398.696895] AppArmor: AppArmor initialized
[18014398.698294] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[18014398.701889] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[18014398.703370] Mount-cache hash table entries: 256
[18014398.703558] Initializing cgroup subsys cpuacct
[18014398.703564] Initializing cgroup subsys memory
[18014398.703580] Initializing cgroup subsys devices
[18014398.703583] Initializing cgroup subsys freezer
[18014398.703585] Initializing cgroup subsys net_cls
[18014398.703588] Initializing cgroup subsys blkio
[18014398.703595] Initializing cgroup subsys perf_event
[18014398.703668] CPU: Physical Processor ID: 0
[18014398.703670] CPU: Processor Core ID: 0
[18014398.703677] mce: CPU supports 16 MCE banks
[18014398.703704] CPU0: Thermal monitoring enabled (TM1)
[18014398.703722] using mwait in idle threads.
[18014398.704892] ACPI: Core revision 20110413
[18014398.729594] x2apic not enabled, IRQ remapping init failed
[18014398.730201] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[18014398.769828] CPU0: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz stepping 07
[18014398.877280] Performance Events: PEBS fmt1+, SandyBridge events, Intel PMU driver.
[18014398.877288] ... version:                3
[18014398.877289] ... bit width:              48
[18014398.877291] ... generic registers:      8
[18014398.877293] ... value mask:             0000ffffffffffff
[18014398.877296] ... max period:             000000007fffffff
[18014398.877298] ... fixed-purpose events:   3
[18014398.877300] ... event mask:             00000007000000ff
[18014398.877514] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.877651] Booting Node   0, Processors  #1
[18014398.877654] smpboot cpu 1: start_ip = 8e000
[18014398.909186] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.909344]  #2
[18014398.909346] smpboot cpu 2: start_ip = 8e000
[18014398.940474] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.940625]  #3
[18014398.940627] smpboot cpu 3: start_ip = 8e000
[18014398.971752] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.971970]  Ok.
[18014398.971972] Booting Node   1, Processors  #4
[18014398.971975] smpboot cpu 4: start_ip = 8e000
[18014399.081093] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.081258]  #5
[18014399.081259] smpboot cpu 5: start_ip = 8e000
[18014399.112362] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.112528]  #6
[18014399.112530] smpboot cpu 6: start_ip = 8e000
[18014399.143633] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.143795]  #7 Ok.
[18014399.143797] smpboot cpu 7: start_ip = 8e000
[18014399.174900] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.174924] Brought up 8 CPUs
[18014399.174927] Total of 8 processors activated (38401.60 BogoMIPS).
[18014399.555854] devtmpfs: initialized
[18014399.559405] PM: Registering ACPI NVS region at 7e69b000 (1105920 bytes)
[18014399.559459] PM: Registering ACPI NVS region at 7f4b1000 (4096 bytes)
[18014399.559462] PM: Registering ACPI NVS region at 7f4bb000 (28672 bytes)
[18014399.559465] PM: Registering ACPI NVS region at 7f4e4000 (548864 bytes)
[18014399.559498] PM: Registering ACPI NVS region at 7f7e0000 (4096 bytes)
[18014399.559645] print_constraints: dummy: 
[18014399.559675] Time: 10:55:14  Date: 10/28/15
[18014399.559772] NET: Registered protocol family 16
[18014399.559933] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[18014399.559938] ACPI: bus type pci registered
[18014399.560001] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[18014399.560006] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
[18014399.605298] PCI: Using configuration type 1 for base access
[18014399.606398] bio: create slab <bio-0> at 0
[18014399.612428] ACPI: EC: Look up EC in DSDT
[18014399.618314] ACPI: Executed 1 blocks of module-level executable AML code
[18014399.758896] ACPI: Interpreter enabled
[18014399.758903] ACPI: (supports S0 S1 S4 S5)
[18014399.758926] ACPI: Using IOAPIC for interrupt routing
[18014399.759318] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[18014399.837279] ACPI: No dock devices found.
[18014399.837285] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[18014399.837640] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e])
[18014399.837989] pci_root PNP0A08:00: host bridge window [io  0x0000-0x03af]
[18014399.837993] pci_root PNP0A08:00: host bridge window [io  0x03e0-0x0cf7]
[18014399.837996] pci_root PNP0A08:00: host bridge window [io  0x03b0-0x03df]
[18014399.837998] pci_root PNP0A08:00: host bridge window [io  0x0d00-0x9fff]
[18014399.838001] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
[18014399.838005] pci_root PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff]
[18014399.838008] pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xdfffffff]
[18014399.838026] pci 0000:00:00.0: [8086:3c00] type 0 class 0x000600
[18014399.838072] pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
[18014399.838075] pci 0000:00:00.0: PME# disabled
[18014399.838099] pci 0000:00:01.0: [8086:3c02] type 1 class 0x000604
[18014399.838148] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[18014399.838151] pci 0000:00:01.0: PME# disabled
[18014399.838177] pci 0000:00:01.1: [8086:3c03] type 1 class 0x000604
[18014399.838225] pci 0000:00:01.1: PME# supported from D0 D3hot D3cold

the other normal-like server's dmesg snippet:

[    0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:2
[    0.000000] PERCPU: Embedded 26 pages/cpu @ffff88027fc00000 s74880 r8192 d23424 u524288
[    0.000000] pcpu-alloc: s74880 r8192 d23424 u524288 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 
[    0.000000] Built 2 zonelists in Zone order, mobility grouping on.  Total pages: 4123819
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: root=/dev/sda1 splash=0 crashkernel=256M-:128M@16M vga=0x31a
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 16430212k/18874368k available (4405k kernel code, 2121024k absent, 323132k reserved, 7781k data, 1356k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] NR_IRQS:262400 nr_irqs:1560 16
[    0.000000] Extended CMOS year: 2000
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] allocated 134217728 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] Fast TSC calibration using PIT
[    0.004000] Detected 2399.963 MHz processor.
[    0.000004] Calibrating delay loop (skipped), value calculated using timer frequency.. 4799.92 BogoMIPS (lpj=9599852)
[    0.000009] pid_max: default: 32768 minimum: 301
[    0.000384] kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
[    0.000616] Security Framework initialized
[    0.000632] AppArmor: AppArmor initialized
[    0.002018] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)

in case it is relavent:

#cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
#    

my manager said that even reboot does not solve the problem,(high CPU load,runs slowly.etc). shutdown the server, and restart half an hour later may solve it.

only a few of the serves act like this ,the others seems normal. and it is hard to reproduce the problem.

Is it likely to be software problem or hardware problem? what should I do to debug? (I don't have the kernel src code).

If you need more information, please let me know!

high-load
dmesg
asked on Server Fault Oct 30, 2015 by user319514

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0