How to determine processor frequency scale down to ~200 MHz due to ThermStatus

0

I am attempting to determine what is causing an embedded industrial computer (ARK-1550-S9A1E) with Intel 4th Gen Core i5-4300U Dual Core to scale down all the cores to around ~200 MHz from 1.90 GHz

There is several utilities/tools (turbostat or msr) tools that indicate that the reason it has scaled down is because of ThermStatus and "Digital Readout" shows 65 C/149 F.

The device itself is running Ubuntu 18.04 LTS server (no GUI, headless application) and the applications running on it are at most taking 20% of the CPU. There is nothing really to spike up this CPU utilization, so it is incredibly surprising that it is overheating. It is an industrial fan-less PC, so it does have a lot of hardware to dissipate heat.

Below is the output form MSR and turbostat for all the detail regarding the register readings.

user1@ubuntu-18.04_64:~$ cat /proc/cpuinfo | grep "MHz"
cpu MHz     : 230.404
cpu MHz     : 227.324
cpu MHz     : 217.117
cpu MHz     : 174.135
user1@ubuntu-18.04_64:~$ 

user1@ubuntu-18.04_64:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
user1@ubuntu-18.04_64:~$ 

user1@ubuntu-18.04_64:~$ sudo rdmsr 0x770 -f 63:0
rdmsr: CPU 0 cannot read MSR 0x00000770
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x771 -f 63:0
rdmsr: CPU 0 cannot read MSR 0x00000771
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x772 -f 63:0
rdmsr: CPU 0 cannot read MSR 0x00000772
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x773 -f 63:0
rdmsr: CPU 0 cannot read MSR 0x00000773
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x775 -f 63:0
rdmsr: CPU 0 cannot read MSR 0x00000775
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x777 -f 63:0
rdmsr: CPU 0 cannot read MSR 0x00000777
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x19C -f 63:0
88410800
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x64E -f 63:0
rdmsr: CPU 0 cannot read MSR 0x0000064e
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x64F -f 63:0
rdmsr: CPU 0 cannot read MSR 0x0000064f
user1@ubuntu-18.04_64:~$ sudo rdmsr 0x19B -f 63:0
13
user1@ubuntu-18.04_64:~$ 

decs@ubuntu-18.04_64$ ./intel-reg-pp.out 
hello from intel_reg_pp!

[19CH] IA32_THERM_STATUS Register With HWP Feedback
  Command to read: sudo rdmsr 0x19c - f 63:0
Value of register is: 88410800
  64  60        50        40        30        20        10
  43210987654321098765432109876543210987654321098765432109876543210
0b00000000000000000000000000000000010001000010000010000100000000000
  └───────────────┬───────────────┘│└─┬┘└─┬┘└──┬──┘││││││││││││││││
              Reserved             │  │   │    │   ││││││││││││││││
Reading Valid ─────────────────────┘  │   │    │   ││││││││││││││││
Reading in Deg. Celcius ──────────────┘   │    │   ││││││││││││││││
Reserved ─────────────────────────────────┘    │   ││││││││││││││││
Digital Readout ───────────────────────────────┘   ││││││││││││││││ 65 C -> 149 F
Cross-domain Limit Log ────────────────────────────┘│││││││││││││││
Cross-domain Limit Status ──────────────────────────┘││││││││││││││
Current Limit Log ───────────────────────────────────┘│││││││││││││
Current Limit Status ─────────────────────────────────┘││││││││││││
Power Limit Notification Log ──────────────────────────┘│││││││││││
Power Limit Notification Status ────────────────────────┘││││││││││
Thermal Threshold #2 Log ────────────────────────────────┘│││││││││
Thermal Threshold #2 Status ──────────────────────────────┘││││││││
Thermal Threshold #1 Log ──────────────────────────────────┘│││││││
Thermal Threshold #1 Status ────────────────────────────────┘││││││
Critical Temperature Log ────────────────────────────────────┘│││││
Critical Temperature Status ──────────────────────────────────┘││││
PROCHOT# or FORCEPR# Log ──────────────────────────────────────┘│││
PROCHOT# or FORCEPR# Event ─────────────────────────────────────┘││
Thermal Status Log ──────────────────────────────────────────────┘│
Thermal Status ───────────────────────────────────────────────────┘


[64FH] MSR_CORE_PERF_LIMIT_REASONS
  Command to read: sudo rdmsr 0x64f - f 63:0
Value of register is: 1c220002
  64  60        50        40        30        20        10
  43210987654321098765432109876543210987654321098765432109876543210
0b00000000000000000000000000000000000011100001000100000000000000010
   └───────────────┬───────────────┘││││││└─┬─┘│││││││││││└─┬─┘││││
              Reserved              ││││││  │  │││││││││││  │  ││││
Maximum Efficiency Frequency Log ───┘│││││  │  │││││││││││  │  ││││
Turbo Transistion Attenuation Log ───┘││││  │  │││││││││││  │  ││││
Electical Design Point Log ───────────┘│││  │  │││││││││││  │  ││││
Max Turbo Limit Log ───────────────────┘││  │  │││││││││││  │  ││││
VR Them Alert Log ──────────────────────┘│  │  │││││││││││  │  ││││
Core Power Limiting Log ─────────────────┘  │  │││││││││││  │  ││││
Reserved ───────────────────────────────────┘  │││││││││││  │  ││││
Package-Level PL2 Power Limiting Log ──────────┘││││││││││  │  ││││
Package-Level PL1 Power Limiting Log ───────────┘│││││││││  │  ││││
Thermal Log ─────────────────────────────────────┘││││││││  │  ││││
PROCHOT Log ──────────────────────────────────────┘│││││││  │  ││││
Reserved ──────────────────────────────────────────┘││││││  │  ││││
Maximum Efficiency Frequency Status (R0)────────────┘│││││  │  ││││
Turbo Transition Attenuation Status (R0)─────────────┘││││  │  ││││
Electrical Design Point Status (R0)───────────────────┘│││  │  ││││
Max Turbo Limit Status (R0) ───────────────────────────┘││  │  ││││
VR Therm Alert Status (R0)──────────────────────────────┘│  │  ││││
Core Power Limiting Status (R0)──────────────────────────┘  │  ││││
Reserved ───────────────────────────────────────────────────┘  ││││
Package-Level PL2 Power Limiting Status (R0) ──────────────────┘│││
Package-Level Power Limiting PL1 Status (R0)────────────────────┘││
Thermal Status (R0) ─────────────────────────────────────────────┘│
PROCHOT Status (R0) ──────────────────────────────────────────────┘


[19BH] IA32_THERM_INTERRUPT
  Command to read: sudo rdmsr 0x64f - f 63:0
Value of register is: 00000013
  64  60        50        40        30        20        10
  43210987654321098765432109876543210987654321098765432109876543210
0b10000000000000000000000000000000000000000000000000000000000010011
   └───────────────┬──────────────────────┘│└──┬──┘│└──┬──┘└┬┘│││││
              Reserved                     │   │   │   │    │ │││││
Threshold #2 INT Enable ───────────────────┘   │   │   │    │ │││││
Threshold #2 Value ────────────────────────────┘   │   │    │ │││││
Threshold #1 INT Enable ───────────────────────────┘   │    │ │││││
Threshold #1 Value ────────────────────────────────────┘    │ │││││
Reserved ───────────────────────────────────────────────────┘ │││││
Critical Temperature Enable ──────────────────────────────────┘││││
FORCEPR# INT Enable ───────────────────────────────────────────┘│││
PROCHOT# INT enable ────────────────────────────────────────────┘││
Low-Temperature INT enable ──────────────────────────────────────┘│
High-Temperature INT Enable ──────────────────────────────────────┘
decs@ubuntu:~/projects/intel-reg-pp/bin/x86/Debug$ 


user1@ubuntu-18.04_64:~$ sudo turbostat
turbostat version 17.06.23 - Len Brown <lenb@kernel.org>
CPUID(0): GenuineIntel 13 CPUID levels; family:model:stepping 0x6:45:1 (6:69:1)
CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM TM
CPUID(6): APERF, TURBO, DTS, PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB
cpu3: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO)
CPUID(7): No-SGX
cpu3: MSR_MISC_PWR_MGMT: 0x00400000 (ENable-EIST_Coordination DISable-EPB DISable-OOB)
RAPL: 17476 sec. Joule Counter Range, at 15 Watts
cpu3: MSR_PLATFORM_INFO: 0x8083df3011900
8 * 100.0 = 800.0 MHz max efficiency frequency
25 * 100.0 = 2500.0 MHz base frequency
cpu3: MSR_IA32_POWER_CTL: 0x0004005d (C1E auto-promotion: DISabled)
cpu3: MSR_TURBO_RATIO_LIMIT: 0x1a1a1a1d
26 * 100.0 = 2600.0 MHz max turbo 4 active cores
26 * 100.0 = 2600.0 MHz max turbo 3 active cores
26 * 100.0 = 2600.0 MHz max turbo 2 active cores
29 * 100.0 = 2900.0 MHz max turbo 1 active cores
cpu3: MSR_CONFIG_TDP_NOMINAL: 0x00000013 (base_ratio=19)
cpu3: MSR_CONFIG_TDP_LEVEL_1: 0x0008005c (PKG_MIN_PWR_LVL1=0 PKG_MAX_PWR_LVL1=0 LVL1_RATIO=8 PKG_TDP_LVL1=92)
cpu3: MSR_CONFIG_TDP_LEVEL_2: 0x001900c8 (PKG_MIN_PWR_LVL2=0 PKG_MAX_PWR_LVL2=0 LVL2_RATIO=25 PKG_TDP_LVL2=200)
cpu3: MSR_CONFIG_TDP_CONTROL: 0x00000000 ( lock=0)
cpu3: MSR_TURBO_ACTIVATION_RATIO: 0x00000012 (MAX_NON_TURBO_RATIO=18 lock=0)
cpu3: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008408 (UNdemote-C3, UNdemote-C1, demote-C3, demote-C1, locked: pkg-cstate-limit=8: unlimited)
cpu3: POLL: CPUIDLE CORE POLL IDLE
cpu3: C1: MWAIT 0x00
cpu3: C1E: MWAIT 0x01
cpu3: C3: MWAIT 0x10
cpu3: C6: MWAIT 0x20
cpu3: C7s: MWAIT 0x32
cpu3: C8: MWAIT 0x40
cpu3: C9: MWAIT 0x50
cpu3: C10: MWAIT 0x60
cpu3: cpufreq driver: intel_pstate
cpu3: cpufreq governor: performance
cpufreq intel_pstate no_turbo: 0
cpu3: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch)
cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced)
cpu0: MSR_CORE_PERF_LIMIT_REASONS, 0x1c220002 (Active: ThermStatus, ) (Logged: MultiCoreTurbo, PkgPwrL2, PkgPwrL1, Auto-HWP, ThermStatus, )
cpu0: MSR_GFX_PERF_LIMIT_REASONS, 0x14020002 (Active: ThermStatus, ) (Logged: ThermStatus, PkgPwrL1, )
cpu0: MSR_RING_PERF_LIMIT_REASONS, 0x0c020000 (Active: ) (Logged: ThermStatus, PkgPwrL1, PkgPwrL2, )
cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x00000078 (15 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x804280c800dd80c8 (locked)
cpu0: PKG Limit #1: ENabled (25.000000 Watts, 28.000000 sec, clamp ENabled)
cpu0: PKG Limit #2: ENabled (25.000000 Watts, 0.002441* sec, clamp DISabled)
cpu0: MSR_PP0_POLICY: 0
cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: Cores Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP1_POLICY: 0
cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: GFX Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00640000 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88400800 (36 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)
cpu3: MSR_PKGC3_IRTL: 0x00008842 (valid, 67584 ns)
cpu3: MSR_PKGC6_IRTL: 0x00008873 (valid, 117760 ns)
cpu3: MSR_PKGC7_IRTL: 0x00008891 (valid, 148480 ns)
cpu3: MSR_PKGC8_IRTL: 0x000088e4 (valid, 233472 ns)
cpu3: MSR_PKGC9_IRTL: 0x00008945 (valid, 332800 ns)
cpu3: MSR_PKGC10_IRTL: 0x000089ef (valid, 506880 ns)
Core    CPU Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ SMI C1  C1E C3  C6  C7s C8  C9  C10 C1% C1E%    C3% C6% C7s%    C8% C9% C10%    CPU%c1  CPU%c3  CPU%c6  CPU%c7  CoreTmp PkgTmp  GFX%rc6 Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt CorWattGFXWatt
-   -   157 69.94   225 2494    22821   0   447 1810    8751    389 1496    971 329 5   0.09    0.73    11.99   1.14    6.28    7.17    3.16    0.00    20.58   6.78    0.25    2.46    35  36  99.38   0.00    0.00    0.00    0.00    0.00    0.00    0.00    2.67    0.22    0.00
0   0   151 64.78   233 2494    6150    0   139 547 2166    145 501 335 80  0   0.11    0.94    11.59   1.74    8.75    9.61    3.02    0.00    22.16   9.01    0.30    3.75    35  36  99.38   0.00    0.00    0.00    0.00    0.00    0.00    0.00    2.67    0.22    0.00
0   2   146 68.06   216 2494    6206    0   120 418 2532    82  362 229 96  2   0.09    0.66    13.98   0.88    5.84    7.01    4.02    0.00    18.88
1   1   202 87.77   231 2494    3457    0   68  206 876 35  153 104 34  2   0.07    0.34    4.57    0.41    2.46    3.30    1.27    0.00    6.32    4.55    0.19    1.17    35
1   3   128 59.14   217 2494    7008    0   120 639 3177    127 480 303 119 1   0.09    1.00    17.82   1.52    8.09    8.76    4.33    0.00    34.95
^C
user1@ubuntu-18.04_64:~$ 

What would be a good way of determining what is causing this frequency scaling down from 1.9 GHz to 200 MHz?

linux
ubuntu
cpu
intel-core-i5
cpufreq
asked on Super User Jul 29, 2019 by Kris

1 Answer

0

65C is usually not high enough to trigger thermal protections, but for some reason it looks like it is. Usually protections won't kick in till around 95-98C. I would dig up more documentation on the MSR bits your are examining. As the name implies - Model Specific Register - these are all specialized and have different meanings on different systems. The Intel Software Developer Manual chapter 14.7 will tell you under what conditions the thermal management MSR bits would be set. For example, many of the 'log' bits are sticky. As in, the bit will be set to 1 until software clears it. That means that at some point since the system was booted, that event occurred. Depending on what software is running, these bits may never be cleared, so getting a log bit set isn't indicative of a major issue. I would dump your thermal limits because they are probably set too aggressively by your computer's manufacturer. You can probably change the values set on boot (probably by BIOS) after booting to OS.

It is completely safe for your processor to have the thermal limits backed off to 98C. However, you mentioned industrial applications. Is there a reason you don't want things getting very hot in your environment? Flammability, reactivity?

answered on Super User Jul 30, 2019 by Andy • edited Jul 30, 2019 by Andy

User contributions licensed under CC BY-SA 3.0