PCI error handler in kernel driver never called, even when device is turned off

0

I'm writing a linux kernel device driver for a custom PCIe device. An user space application is mmapped to this device and frequently accessing its memory (read and write). The PCIe device is driven by an external power supply which may be turned off during runtime.

Whenever the device is reset, all memory reads of my user application return 0xFFFFFFFF. I want to detect device resets as soon as possible in the kernel driver, so I implemented an error_detected callback function according to https://www.kernel.org/doc/html/latest/PCI/pci-error-recovery.html.

static pci_ers_result_t mydevice_error_detected(struct pci_dev* dev, pci_channel_state_t state) {
   printk(KERN_ALERT "mydevice PCI error detected");
   return PCI_ERS_RESULT_DISCONNECT;
}

static struct pci_error_handlers mydevice_error_handlers = {
   .error_detected = mydevice_error_detected,
   .slot_reset = mydevice_slot_reset,
   .resume = mydevice_resume
};

static struct pci_driver mydevice_driver = {
   .name = "mydevice",
   .id_table = mydevice_ids,
   .probe = mydevice_probe,
   .remove = mydevice_remove,
   .suspend = mydevice_suspend,
   .resume = mydevice_resume,
   .err_handler = &mydevice_error_handlers
};

However, mydevice_error_detected is never called during device reset, even though the user space application is continuously trying to unsuccessfully read device memory (and get 0xFFFFFFFF as result).

Also, lspci still lists the device after PCI rescan, even if it got turned off:

01:00.0 Unassigned class [ff00]: MyVendorId Device 5a00 (rev ff)

The only difference is that "rev ff" occurs at the end of the line when the device is in turned off state. Otherwise lspci returns

01:00.0 Unassigned class [ff00]: MyVendorId Device 5a00

I'm pretty sure the device is completely turned off, since configuration space can not be accessed during reset. I'd expect the kernel to call the error detection callback whenever the first memory read request to the device fails/timeouts. Is my assumption correct?

c
linux
linux-kernel
pci-e
pci-bus
asked on Stack Overflow Mar 24, 2020 by visapi • edited Mar 26, 2020 by visapi

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0