can I create UDEV rule to keep track of events for non /dev/<device>?

0

Need to call a script whenever edac error is thrown up by kernel/system.

I created following UDEV rule for this purpose, if the ce_count changes then I would like to execute /var/tmp/test.sh, then I did "udevadm control --reload-rules && udevadm trigger" and "udevadm monitor", also induced errors using mce-inect but the script didnt execute.

cat /etc/udev/rules.d/98-edac.rules

ACTION=="change", ATTR{ce_count}, KERNEL=="mc0", RUN+="/var/tmp/test.sh"

root@host0:/var/tmp# udevadm info -ap /sys/devices/system/edac/mc/mc0                             
Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

 looking at device '/devices/system/edac/mc/mc0':
   KERNEL=="mc0"
   SUBSYSTEM=="mc0"
   DRIVER==""
   ATTR{ce_count}=="21"
   ATTR{ce_noinfo_count}=="0"
   ATTR{max_location}=="channel 7 slot 2 "
   ATTR{mc_name}=="Broadwell Socket#0"
   ATTR{seconds_since_reset}=="5223"
   ATTR{size_mb}=="65536"
   ATTR{ue_count}=="0"
   ATTR{ue_noinfo_count}=="0"

  looking at parent device '/devices/system/edac/mc':
   KERNELS=="mc"
   SUBSYSTEMS=="edac"
   DRIVERS==""

  looking at parent device '/devices/system/edac':
  KERNELS=="edac"
  SUBSYSTEMS==""
  DRIVERS==""

 root@host0:/var/tmp# udevadm info -ap /sys/devices/system/edac/mc/mc0

I induce edac/mce faults using mce-inject ./mce-inject ./basic-inject.txt

root@host0:/var/tmp# cat basic-inject.txt 
#
CPU 0 BANK 8 
STATUS corrected 
ADDR 0x12345125 
MCGCAP 0x7000c16 
APICID 0  
MCGSTATUS 0 
SOCKETID 0 
MISC 0x50683286 
STATUS 0x8c00004000010090 
root@host0:/var/tmp# 

Kernel syslog/dmesg has log entry after error inserted

[  +4.436747] Starting machine check poll CPU 0
[  +0.000013] mce: [Hardware Error]: Machine check events logged
[  +0.000008] Machine check poll done on CPU 0
[  +0.000030] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[  +0.000002] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 8: 8c00004000010090
[  +0.000001] EDAC sbridge MC0: TSC 0 
[  +0.000002] EDAC sbridge MC0: ADDR 12345100 
[  +0.000000] EDAC sbridge MC0: MISC 50683286 
[  +0.000002] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1593625089 SOCKET 0 APIC 0
[  +0.000005] EDAC DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 1
[  +0.000005] EDAC DEBUG: get_memory_error_data: TAD#0: address 0x0000000012345100 < 0x000000007fffffff, socket interleave 0, channel interleave 2 (offset 0x00000000), index 0, base ch: 2, ch mask: 0x04
[  +0.000007] EDAC DEBUG: get_memory_error_data: RIR#0, limit: 31.999 GB (0x00000007ffffffff), way: 4
[  +0.000002] EDAC DEBUG: get_memory_error_data: RIR#0: channel address 0x091a2880 < 0x7ffffffff, RIR interleave 2, index 1
[  +0.000002] EDAC DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:4 rank:4
[  +0.000007] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#1 (channel:2 slot:1 page:0x12345 offset:0x100 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:4 rank:4)
[Jul 1 17:41] perf: interrupt took too long (3923 > 3920), lowering kernel.perf_event_max_sample_rate to 50000
linux
memory
cpu
kernel
udev
asked on Super User Jul 1, 2020 by user1457958 • edited Jul 1, 2020 by user1457958

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0