Recently my 9 year old Apple G4 file server has been randomly crashing. Often it's a kernel panic, but sometimes the system just locks up. It seems almost always to happen when I'm out of my office... but even when I'm in my office, the system is in a separate server room and almost never has anyone at the console. Suspecting bad RAM, I ran memtest but after 20 passes it found no problems. (I ran 10 passes, rebooted, and ran 10 more. Single user mode both times). Apple Hardware Test also reports no problems (After being run in looping more for over 100 loops)
I suspect the hardware is just going bad... it is 9 years old after all. But we don't have the budget to replace the server at this point in time. Until our next upgrade, what would be my best options? Any way to troubleshoot what's crashing? Or at the very least, any way to have the system automatically reboot after a kernel panic or lockup so that it can resume serving?
panic.log shows:
Mon Jun 29 12:52:23 2009
panic(cpu 1 caller 0x00040180): zalloc: "socket" (751876 elements) retry fail 3
Latest stack backtrace for cpu 1:
Backtrace:
0x000954F8 0x00095A10 0x00026898 0x00040180 0x0026B868 0x00290E10 0x00290F1C 0x00296B40
0x002ABDB8 0x000ABD30 0x00000000
Proceeding back via exception chain:
Exception state (sv=0x32288780)
PC=0x9001B08C; MSR=0x0000F030; DAR=0x12555000; DSISR=0x42000000; LR=0x8EF88A00; R1=0xBFFFF700; XCP=0x0000003
0 (0xC00 - System call)
Kernel version:
Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC
*********
Fri Jul 3 10:15:24 2009
panic(cpu 1 caller 0x00040180): zalloc: "socket" (762004 elements) retry fail 3
Latest stack backtrace for cpu 1:
Backtrace:
0x000954F8 0x00095A10 0x00026898 0x00040180 0x0026B868 0x00290E10 0x00290F1C 0x00296B40
0x002ABDB8 0x000ABD30 0x00000000
Proceeding back via exception chain:
Exception state (sv=0x2C543000)
PC=0x9001B08C; MSR=0x0000F030; DAR=0x11A41000; DSISR=0x42000000; LR=0x8EF88A00; R1=0xBFFFF700; XCP=0x0000003
0 (0xC00 - System call)
Kernel version:
Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC
*********
Tue Jul 21 20:44:47 2009
panic(cpu 1 caller 0x00040180): zalloc: "socket" (762004 elements) retry fail 3
Latest stack backtrace for cpu 1:
Backtrace:
0x000954F8 0x00095A10 0x00026898 0x00040180 0x0026B868 0x00290E10 0x00290F1C 0x00296B40
0x002ABDB8 0x000ABD30 0x00000000
Proceeding back via exception chain:
Exception state (sv=0x2C543000)
PC=0x9001B08C; MSR=0x0000F030; DAR=0x11A41000; DSISR=0x42000000; LR=0x8EF88A00; R1=0xBFFFF700; XCP=0x0000003
0 (0xC00 - System call)
Kernel version:
Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC
*********
I'm assuming that if this is running as a file server that it's running Mac OS X Server, correct? If it's not automatically rebooting after a kernel panic then your hardware is old enough that it's probably not supported for that as it's the default on Server.
Obviously, Server won't try to reboot if it's not a kernel panic and is just hung, but I've found Sophisticated Circuits' Kick-Off! to be an excellent solution to that problem. Basically, their software pings their hardware in the power pass-through every once in a while and if the box locks up and stops pinging it then it cycles the power. Presto! Auto reboot, kernel panic or no!
I'm going to assume that you've already looked at the CrashReporter and other system logs to see if anything interesting's shown up there.
But when trying to squeeze some extra time out of old machines, one of the first things I do is check the cooling -- get all of the dust out of the box, then check to make sure the fans are spinning well.
... and if you're running client, and take morgant's suggestion with the power cycling, look in 'Energy Saver -> Options' for 'Restart automatically after a power failure'. It's also where you'll find the setting for 'Restart automatically if the server "freezes"' if you're running OS X Server.
Do you know what is causing the kernel panic? Which specific kernel extension is the computer failing at?
I posted a bit about how to read a kernel panic log on an unrelated question on Super User in hopes it helps:
If it's not a package you can find out the name of the kext from the kernel panic: You can find this information at
~/Library/Logs/panic.log
or when you restart your computer after the panic it will ask if you want to report the error to Apple. Press Report and then click the centre tab to see the crash details.An example would be:
panic(cpu 0 caller 0x0035C330): freeing free mbuf Backtrace, Format - Frame : Return Address (4 potential args on stack) 0x2545bc08 : 0x128d08 (0x3c9afc 0x2545bc2c 0x131de5 0x0) 0x2545bc48 : 0x35c330 (0x3ea258 0x3ae65000 0x23935100 0x493e0) 0x2545bc88 : 0x7424a4 (0x36f19300 0x493e0 0x0 0x134b11) 0x2545bca8 : 0x9f1458 (0x23935000 0x36f19300 0x0 0x0) 0x2545bcd8 : 0x9ef6d6 (0x23935000 0x36f19300 0x0 0x0) 0x2545bcf8 : 0x9fa0ce (0x23935000 0x36f15f00 0x1000000 0x0) 0x2545bea8 : 0x9f375a (0x23935000 0x3a14880 0x40000000 0x34fb8b) 0x2545bf08 : 0x398f79 (0x23935000 0x3a14880 0x1 0x13becf) 0x2545bf58 : 0x39814b (0x3a14880 0x4121d48 0x4121d8c 0x0) 0x2545bf88 : 0x397e81 (0x3a184c0 0x5d3734 0x452084 0x40431f4) 0x2545bfc8 : 0x19a77c (0x3a184c0 0x0 0x19d0b5 0x696543c) Backtrace terminated-invalid frame pointer 0x0
Kernel loadable modules in backtrace (with dependencies):
com.apple.iokit.AppleYukon(1.0.9b3)@0x9ed000
dependency: com.apple.iokit.IONetworkingFamily(1.5.1)@0x73b000
dependency: com.apple.iokit.IOPCIFamily(2.2)@0x60a000
dependency: com.apple.iokit.IOACPIFamily(1.2.0)@0x6b6000
com.apple.iokit.IONetworkingFamily(1.5.1)@0x73b000
Kernel version:
Darwin Kernel Version 8.8.2: Thu Sep 28 20:43:26 PDT 2006; root:xnu-792.14.14.obj~1/RELEASE_I386
I've set the relative lines apart. Specifically you're looking for the first line after "Kernel loadable modules...". In this case the item is com.apple.iokit.AppleYukon (which is the Ethernet driver/kernel extension) so the file name would be com.apple.iokit.AppleYukon.kext.
On your server, you can run the last
command to see when it has booted up (as well as normal restarts.) Is there anything interesting in the system logs that happens around this time?
Also, being a file server, be sure to check the hard drives. We have a G5 (?) connected to a RAID, and it does not function right when the RAID is unhappy.
Also look at memory, corruption or other memory faults are quite likely to cause that sort of random thing.
If you can swap the DIMMs into an x86 PC try using MemTest x86+ to see if there are any obvious errors, although MemTest can show clean if the errors are random or obscure enough.
User contributions licensed under CC BY-SA 3.0