VM becomes slow after some days of runtime with 48 GB of RAM, not with 6 GB

Question

VM becomes slow after some days of runtime with 48 GB of RAM, not with 6 GB

I'm dealing with a problem for some weeks now which results in a very slow VM-guest after the VM ran for some days.

"slow" means that CPU-bound operations take more time than before and as well that those operations seem to accumulate over time. Reloading ClamD-signatures for example takes ~35 seconds and 100 % on one core normally, which increases to 1 minute and more without any other load, but can easily take 10 or 15 minutes with some other load. That other load might be database queries by some web app, creating 100 % load on a core in itself already. It seems that without the problem both operations simply process as fast as the CPU is capable to, while with the problem both CPU-bound tasks get slower in itself and at the same time raise the overall load on the system. Every other little operation like htop or such creates an unnormal high load as well then. Additionally, processes like ClamD with 100 % load on one core normally are now show as creating 150 % load or more. Which in theory, and as ClamAV-people said, is impossible for reloading signatures because that is simply not multi-threaded. So it seems that some overhead is introduced which reduces overall system performance heavily. At the same time, neither the VM host itself or other VMs on the same host suffer from any performance problems.

This happened with a guest-OS of UB 14.04 LTS in the past and as well with 16.04 LTS after a fresh new install including recreating the VM and such. I think I was able to track this down to one difference: If the VM is used with 48 GB of RAM the problem occurs after some days of runtime, if it is used with only 6 GB of RAM it doesn't. I'm very sure that the amount of RAM really is the only difference in both cases, the workload tested is the same and provided by some automatically running tests using Jenkins and signature updates by ClamD. It is very likely that the problem doesn't occur with at least 8 GB of RAM as well, because I have another VM with such memory not showing the problem, but I don't know currently what the upper limit of RAM is until the problem occurs. It's pretty time consuming to test this, because the problem doesn't exist right from the start, it starts happening at some time.

My server is a HP DL380 G7 with 2 Intel Xeon X5675 @ 3,07 GHz with 144 GB of RAM, evenly distributed over all sockets and RAM slots. It runs UB 16.04 LTS, hosts the VMs on ZFS and the VM tested has 8 vCPUs and either 48 GB of RAM or 6 assigned. The server's resources should be more than enough for my needs, the former used G6 was a bit slower with a bit less RAM and didn't show these problems. And without the problem occurring with 48 GB of RAM, the VM behaves as expected as well. I'm pretty much certain that there's no swapping or memory overcommitting in the host:

top - 11:49:38 up 28 days, 13:54,  1 user,  load average: 0.26, 0.33, 0.35
Tasks: 904 total,   1 running, 899 sleeping,   0 stopped,   4 zombie
%Cpu(s):  0.1 us,  0.5 sy,  0.0 ni, 99.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 14853158+total,  5032192 free, 13115475+used, 12344644 buff/cache
KiB Swap:  5852156 total,  5852144 free,       12 used. 11533812 avail Mem

I'm currently looking at NUMA vs. "Node Interleaving", but am somewhat sure that NUMA is enabled. Additionally, from what I've read, performance impact might be around 20 % or even 40 %, but not that heavy that some processes like connecting to the database time out entirely. I've read as well that in most cases one should simply not deal with NUMA-specifics at all, but keep OS-defaults and let the kernel decide where to schedule which thread etc. I don't need the last bit of performance anyway, it's only that currently things get unacceptable slow after some time.

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 72477 MB
node 0 free: 14758 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 72572 MB
node 1 free: 11046 MB
node distances:
node  0   1
      0:  10  20
      1:  20  10
$ dmesg | grep -i numa
[    0.000000] NUMA: Node 0 [mem 0x00000000-0xdfffffff] + [mem 0x100000000-0x121fffffff] -> [mem 0x00000000-0x121fffffff]
[    0.000000] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
$ sysctl -a | grep numa_
kernel.numa_balancing = 1
kernel.numa_balancing_scan_delay_ms = 1000
kernel.numa_balancing_scan_period_max_ms = 60000
kernel.numa_balancing_scan_period_min_ms = 1000
kernel.numa_balancing_scan_size_mb = 256

Besides NUMA, I've read about hugepages in Linux and largepages of VirtualBox, but from my understanding not using either of both should have such a dramatic negative impact like I'm seeing. VirtualBox talks about ~5 % performance benefit by using largepages and while hugepages are not set explicitly in my host, those are used and available using "transparent huge pages" from what I see in /proc/vmstat.

What makes me wonder is that 48 GB of RAM isn't that much memory at all, I've read other users running into problems only after more than 128 GB have been assigned and developers telling that they successfully tested with 1 TB of RAM. Additionally, amounts of (up to) 24 GB work as well, that was used by the problematic VM before without any problem and is at the time of this writing again.

Do you have any idea what could create the problem here?

linux

hp-proliant

virtualbox

ubuntu-16.04

asked on Server Fault May 25, 2018 by

Thorsten Schöning • edited Dec 19, 2018 by

Thorsten Schöning

2 Answers

This happens when guest uses lots of memory on NUMA machine. The KSM might merge similar memory pages of different VMs, sitting on different NUMA memory regions, causing the affected processes to crawl.

Disable KSM merge_across_nodes:

echo 2 > /sys/kernel/mm/ksm/run && sleep 300 && cat /sys/kernel/mm/ksm/pages_shared

If there are no pages shared :

echo 0 > /sys/kernel/mm/ksm/merge_across_nodes && echo 1 > /sys/kernel/mm/ksm/run

make sure to set merge_across_nodes in /etc/sysctl.d to stay across reboots.

answered on Server Fault Feb 6, 2019 by

Arie Skliarouk

The behaviour I see pretty well fits to the following problem discussed for the Linux-Kernel:

Dueling memory-management performance regressions

Even though it talks about swapping mostly, the author of the patch fixing this got heavy-CPU-load only as well:

vfio is a good test because by pinning all memory it avoids the swapping and reclaim only wastes CPU, a memhog based test would created swapout storms and supposedly show a bigger stddev.

The one thing I'm not sure about is the influence of Transparent Huge Pages because while enabled by default in my system, VirtualBox doesn't seem to use those and they seem to be opt-in in general regarding OS-settings:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ cat /sys/kernel/mm/transparent_hugepage/defrag
always defer defer+madvise [madvise] never

Everything else fits perfectly well to what I saw.

answered on Server Fault Nov 21, 2019 by

Thorsten Schöning

User contributions licensed under CC BY-SA 3.0