We have a HW problem with the disks, that made all the mount points to be read only. Output of dmesg
:
end_request: I/O error, dev sda, sector 15574609
sd 0:0:0:0: SCSI error: return code = 0x00040000
We want to analyze a program that is currently running, because it should have died when he couldn't write to the file syste. So, we would like to use strace
to debug the system calls.
But the output of strace
is:
Bus error
It seems some resources are not available to the machine or some low-level error. I am stuck about how analizing the program, before the sysadmins repair the disk.
Your disk is (probably, in fact almost certainly) dying. It sounds like your sysadmins have already reached this conclusion.
Prepare for the funeral by dressing your backups in black and performing a restore test.
Re: the bus error - this should have been immediately lethal to the program in question. It's the signal equivalent of "WTF? That's unpossible!" (See this SO question - they're talking about memory, but the same thing can happen with disks, or any addressable component). I don't recall if you can catch SIGBUS, but if your program is doing so it shouldn't.
Further questions on how to trace/debug your software should really be asked over on StackOverflow or Programmers.
Sounds like your system can't even load the utilities/libraries needed to do the tracing.
The correct thing to here is:
User contributions licensed under CC BY-SA 3.0