During mvn compilation, I have random crashes.
The problem seems related to high IO and in kern.log, I can see things like:
kernel: [158430.895045] nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
kernel: [158430.951331] blk_update_request: I/O error, dev nvme0n1, sector 819134096 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
kernel: [158430.995307] nvme nvme1: Removing after probe failure status: -19
kernel: [158431.035065] blk_update_request: I/O error, dev nvme0n1, sector 253382656 op 0x1:(WRITE) flags 0x4000 phys_seg 127 prio class 0
kernel: [158431.035083] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:309: I/O error 10 writing to inode 3933601 (offset 16777216 size 2101248 starting block 31672832)
kernel: [158431.035085] Buffer I/O error on device nvme0n1p1, logical block 31672320
kernel: [158431.035090] ecryptfs_write_inode_size_to_header: Error writing file size to header; rc = [-5]
To replicate the error, I use:
stress-ng --all 8  --timeout 60s --metrics-brief --tz
I've tried some boot options, like adding acpiphp.disable=1 pcie_aspm=off to /etc/default/grup, this seemed to help stress-ng test, but not my compilation.
nvme list shows:
Node             SN                   Model                            Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     28FF72PTFQAS         KXG50ZNV256G NVMe TOSHIBA 256GB          1        256,06  GB / 256,06  GB    512   B +  0 B   AADA4102
/dev/nvme1n1     37DS103NTEQT         THNSN5512GPU7 NVMe TOSHIBA 512GB         1         512,11 GB / 512,11  GB    512   B +  0 B   57DC4102
I can't exactly tell you where the problem is as this is just a "generic failure" somewhere in NVMe subsystem. But I can suggest what you can try to pinpoint the problem.
I noticed that the errors only occurred on one of the ssd's, the one containing /home
Moved /home to the other disk in the machine, and so far it seems to be working much better.
fast thing to just try is hot-swap the harddrive driver.
but for performance IO, u can't go cheap also. Check max latency, see how much your going over. maybe ur just trying something that demands a better driver with the kernel.
look in some cmake config or some compiler agruement to use only 1 thread or less IO, slow it down somehow, if you can use the terminal to pause the process manually, u might be able to simulate a compile, if your very desperate,
only other thing that can be done quick is make VM machine of that machine, and compile it on VM, and debug it on live.
User contributions licensed under CC BY-SA 3.0