Xen VM continuously restarts and kernel fails to start init

1

I am using a VM built using the Xen hypervisor and libvirt management layer. I couldn't connect to the VM after updating the Debian verison from 8 to 9 so I tried logging into the physical machine the VM is running on and restarting the VM with virsh. However, after stopping and starting the VM, it seems to be in a reboot loop. The domain ID keeps increasing; it started at ID 8 and now it's at 400+ after a few hours. (Here shell1 is the domain name.)

$ virsh list --all
 Id    Name                           State
----------------------------------------------------
 412   shell1                         running

This is the result of virsh console shell1 kernel logs

Escape character is ^]
0 000024 (v02 Xen   )
[    0.000000] ACPI: XSDT 0x00000000FC00A550 000054 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: FACP 0x00000000FC00A280 0000F4 (v04 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: DSDT 0x00000000FC001250 008FAC (v02 Xen    HVM      00000000 INTL 20160831)
[    0.000000] ACPI: FACS 0x00000000FC001210 000040
[    0.000000] ACPI: FACS 0x00000000FC001210 000040
[    0.000000] ACPI: APIC 0x00000000FC00A380 000068 (v02 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: HPET 0x00000000FC00A460 000038 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: WAET 0x00000000FC00A4A0 000028 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: SSDT 0x00000000FC00A4D0 000031 (v02 Xen    HVM      00000000 INTL 20160831)
[    0.000000] ACPI: SSDT 0x00000000FC00A510 000031 (v02 Xen    HVM      00000000 INTL 20160831)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000018f7fffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x18f7fffff]
[    0.000000]   NODE_DATA [mem 0x18f7f9000-0x18f7fdfff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x18f7fffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009efff]
[    0.000000]   node   0: [mem 0x00100000-0xefffefff]
[    0.000000]   node   0: [mem 0x100000000-0x18f7fffff]
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-47
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0xeffff000-0xefffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xf0000000-0xfbffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfc000000-0xffffffff]
[    0.000000] e820: [mem 0xf0000000-0xfbffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen HVM
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 27 pages/cpu @ffff88018f400000 s80960 r8192 d21440 u2097152
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1549220
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 root=/dev/xvda2 ro elevator=noop console=ttyS0
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] AGP: Checking aperture...
[    0.000000] AGP: No AGP bridge found
[    0.000000] Memory: 6086104K/6282868K available (5247K kernel code, 947K rwdata, 1832K rodata, 1208K init, 840K bss, 196764K reserved)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000]  RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=1.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS:33024 nr_irqs:256 16
[    0.000000] xen:events: Using FIFO-based ABI
[    0.000000] xen:events: Xen HVM callback vector for event delivery is enabled
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [ttyS0] enabled
[    0.000000] tsc: Detected 3598.992 MHz processor
[    0.012000] Calibrating delay loop (skipped), value calculated using timer frequency.. 7197.98 BogoMIPS (lpj=14395968)
[    0.028004] pid_max: default: 32768 minimum: 301
[    0.036012] ACPI: Core revision 20140424
[    0.051837] ACPI: All ACPI Tables successfully acquired
[    0.060540] Security Framework initialized
[    0.068009] AppArmor: AppArmor disabled by boot time parameter
[    0.080003] Yama: disabled by default; enable with sysctl kernel.yama.*
[    0.092393] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.101268] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.108550] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.120032] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.136390] Initializing cgroup subsys memory
[    0.144026] Initializing cgroup subsys devices
[    0.152026] Initializing cgroup subsys freezer
[    0.160016] Initializing cgroup subsys net_cls
[    0.168021] Initializing cgroup subsys blkio
[    0.176016] Initializing cgroup subsys perf_event
[    0.188011] Initializing cgroup subsys net_prio
[    0.196145] CPU: Physical Processor ID: 0
[    0.204006] CPU: Processor Core ID: 0
[    0.212011] mce: CPU supports 2 MCE banks
[    0.220060] Last level iTLB entries: 4KB 512, 2MB 7, 4MB 7
[    0.220060] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
[    0.220060] tlb_flushall_shift: 6
[    0.270008] Freeing SMP alternatives memory: 20K (ffffffff81a1c000 - ffffffff81a21000)
[    0.291417] ftrace: allocating 21701 entries in 85 pages
[    0.352435] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=0 pin2=0
[    0.407302] smpboot: CPU0: Intel(R) Xeon(R) CPU           X5687  @ 3.60GHz (fam: 06, model: 2c, stepping: 02)
[    0.424050] installing Xen timer for CPU 0
[    0.432177] Performance Events: unsupported p6 CPU model 44 no PMU driver, software events only.
[    0.445857] x86: Booted up 1 node, 1 CPUs
[    0.448018] smpboot: Total of 1 processors activated (7197.98 BogoMIPS)
[    0.452469] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.456253] devtmpfs: initialized
[    0.468220] pinctrl core: initialized pinctrl subsystem
[    0.472141] NET: Registered protocol family 16
[    0.476238] cpuidle: using governor ladder
[    0.480019] cpuidle: using governor menu
[    0.484089] ACPI: bus type PCI registered
[    0.488013] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.492734] PCI: Using configuration type 1 for base access
[    0.497450] ACPI: Added _OSI(Module Device)
[    0.500013] ACPI: Added _OSI(Processor Device)
[    0.504013] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.508013] ACPI: Added _OSI(Processor Aggregator Device)
[    0.522128] ACPI: Interpreter enabled
[    0.524024] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20140424/hwxface-580)
[    0.535711] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20140424/hwxface-580)
[    0.544033] ACPI: (supports S0 S3 S4 S5)
[    0.548012] ACPI: Using IOAPIC for interrupt routing
[    0.552054] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.568633] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.572021] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.576031] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.580343] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.584888] acpiphp: Slot [3] registered
[    0.588159] acpiphp: Slot [4] registered
[    0.592124] acpiphp: Slot [5] registered
[    0.596134] acpiphp: Slot [6] registered
[    0.600131] acpiphp: Slot [7] registered
[    0.604151] acpiphp: Slot [8] registered
[    0.608129] acpiphp: Slot [9] registered
[    0.612098] acpiphp: Slot [10] registered
[    0.616120] acpiphp: Slot [11] registered
[    0.620109] acpiphp: Slot [12] registered
[    0.624112] acpiphp: Slot [13] registered
[    0.628119] acpiphp: Slot [14] registered
[    0.632154] acpiphp: Slot [15] registered
[    0.636117] acpiphp: Slot [16] registered
[    0.640118] acpiphp: Slot [17] registered
[    0.644127] acpiphp: Slot [18] registered
[    0.648119] acpiphp: Slot [19] registered
[    0.652118] acpiphp: Slot [20] registered
[    0.656119] acpiphp: Slot [21] registered
[    0.660149] acpiphp: Slot [22] registered
[    0.664119] acpiphp: Slot [23] registered
[    0.668113] acpiphp: Slot [24] registered
[    0.672117] acpiphp: Slot [25] registered
[    0.676115] acpiphp: Slot [26] registered
[    0.680117] acpiphp: Slot [27] registered
[    0.684115] acpiphp: Slot [28] registered
[    0.688113] acpiphp: Slot [29] registered
[    0.692113] acpiphp: Slot [30] registered
[    0.696139] acpiphp: Slot [31] registered
[    0.700129] PCI host bridge to bus 0000:00
[    0.704018] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.708015] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.712015] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.716015] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.720015] pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfbffffff]
[    0.748366] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.752022] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.756015] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.760016] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.768611] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.772162] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[    0.827260] ACPI: PCI Interrupt Link [LNKA] (IRQs *5 10 11)
[    0.831799] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.835629] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.839445] ACPI: PCI Interrupt Link [LNKD] (IRQs *5 10 11)
[    0.845239] ACPI: Enabled 2 GPEs in block 00 to 0F
[    0.848110] xen:balloon: Initialising balloon driver
[    0.856057] xen_balloon: Initialising balloon driver
[    0.860266] vgaarb: setting as boot device: PCI:0000:00:03.0
[    0.864000] vgaarb: device added: PCI:0000:00:03.0,decodes=io+mem,owns=io+mem,locks=none
[    0.864021] vgaarb: loaded
[    0.868013] vgaarb: bridge control possible 0000:00:03.0
[    0.872044] init_memory_mapping: [mem 0x190000000-0x197ffffff]
[    0.876187] PCI: Using ACPI for IRQ routing
[    0.882529] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.884038] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.890542] hpet0: 3 comparators, 64-bit 62.500000 MHz counter
[    0.896028] amd_nb: Cannot enumerate AMD northbridges
[    0.900036] Switched to clocksource xen
[    0.914008] pnp: PnP ACPI init
[    0.920390] ACPI: bus type PNP registered
[    0.928055] system 00:00: [mem 0x00000000-0x0009ffff] could not be reserved
[    0.941437] system 00:01: [io  0x08a0-0x08a3] has been reserved
[    0.954150] system 00:01: [io  0x0cc0-0x0ccf] has been reserved
[    0.966208] system 00:01: [io  0x04d0-0x04d1] has been reserved
[    0.978430] system 00:07: [io  0xae00-0xae0f] has been reserved
[    0.989888] system 00:07: [io  0xb044-0xb047] has been reserved
[    1.002139] pnp: PnP ACPI: found 8 devices
[    1.010482] ACPI: bus type PNP unregistered
[    1.026018] NET: Registered protocol family 2
[    1.034611] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    1.048711] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    1.062300] TCP: Hash tables configured (established 65536 bind 65536)
[    1.076282] TCP: reno registered
[    1.083473] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    1.096895] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    1.111198] NET: Registered protocol family 1
[    1.120773] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    1.132448] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    1.143536] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    1.155975] Unpacking initramfs...
[    1.653611] Freeing initrd memory: 30932K (ffff880034386000 - ffff8800361bb000)
[    1.664987] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.671339] software IO TLB [mem 0xebfff000-0xeffff000] (64MB) mapped at [ffff8800ebfff000-ffff8800efffefff]
[    1.680275] microcode: CPU0 sig=0x206c2, pf=0x1, revision=0x14
[    1.685279] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    1.693165] futex hash table entries: 256 (order: 2, 16384 bytes)
[    1.702078] audit: initializing netlink subsys (disabled)
[    1.710578] audit: type=2000 audit(1613960999.504:1): initialized
[    1.717050] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    1.723569] zbud: loaded
[    1.726975] VFS: Disk quotas dquot_6.5.2
[    1.732474] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.742456] msgmni has been set to 11947
[    1.749746] alg: No test for stdrng (krng)
[    1.757405] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    1.771209] io scheduler noop registered (default)
[    1.780259] io scheduler deadline registered
[    1.788138] io scheduler cfq registered
[    1.795348] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    1.805509] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    1.817802] GHES: HEST is not enabled!
[    1.825431] xen:grant_table: Grant tables using version 1 layout
[    1.836685] Grant table initialized
[    1.843914] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    1.897443] 00:06: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    1.912105] Linux agpgart interface v0.103
[    1.920107] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[    1.940049] serio: i8042 KBD port at 0x60,0x64 irq 1
[    1.948666] serio: i8042 AUX port at 0x60,0x64 irq 12
[    1.958392] mousedev: PS/2 mouse device common for all mice
[    1.966923] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    1.981401] input: Xen Virtual Keyboard as /devices/virtual/input/input2
[    1.995164] input: Xen Virtual Pointer as /devices/virtual/input/input4
[    2.018214] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0
[    2.029967] rtc_cmos 00:02: alarms up to one day, 114 bytes nvram, hpet irqs
[    2.043409] ledtrig-cpu: registered to indicate activity on CPUs
[    2.055112] AMD IOMMUv2 driver by Joerg Roedel <joerg.roedel@amd.com>
[    2.067465] AMD IOMMUv2 functionality not available on this system
[    2.079793] TCP: cubic registered
[    2.086264] NET: Registered protocol family 10
[    2.094962] mip6: Mobile IPv6
[    2.100560] NET: Registered protocol family 17
[    2.108302] mpls_gso: MPLS GSO support
[    2.115426] registered taskstats version 1
[    2.124182] xenbus_probe_frontend: Device with no driver: device/vbd/51712
[    2.136918] xenbus_probe_frontend: Device with no driver: device/vif/0
[    2.150014] rtc_cmos 00:02: setting system clock to 2021-02-22 02:30:00 UTC (1613961000)
[    2.168960] Freeing unused kernel memory: 1208K (ffffffff818ee000 - ffffffff81a1c000)
[    2.184230] Write protecting the kernel read-only data: 8192k
[    2.196185] Freeing unused kernel memory: 884K (ffff880001523000 - ffff880001600000)
[    2.211466] Freeing unused kernel memory: 216K (ffff8800017ca000 - ffff880001800000)
[    2.227230] Failed to execute /init (error -8)
[    2.236746] Starting init: /sbin/init exists but couldn't execute it (error -8)
[    2.251559] Starting init: /bin/sh exists but couldn't execute it (error -8)
[    2.265834] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.
[    2.269815] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.39-1
[    2.269815] Hardware name: Xen HVM domU, BIOS 4.8.5 01/10/2020
[    2.269815]  0000000000000000 ffffffff81514c11 ffffffff81705310 ffff8801880f3f40
[    2.269815]  ffffffff8151195e ffffffff00000008 ffff8801880f3f50 ffff8801880f3ef0
[    2.269815]  ffff8801880f3ef8 0000000000000046 000000000000093d 000000000000093d
[    2.269815] Call Trace:
[    2.269815]  [<ffffffff81514c11>] ? dump_stack+0x5d/0x78
[    2.269815]  [<ffffffff8151195e>] ? panic+0xc8/0x206
[    2.269815]  [<ffffffff81507da0>] ? rest_init+0x80/0x80
[    2.269815]  [<ffffffff81507e82>] ? kernel_init+0xe2/0xf0
[    2.269815]  [<ffffffff8151ad18>] ? ret_from_fork+0x58/0x90
[    2.269815]  [<ffffffff81507da0>] ? rest_init+0x80/0x80
[    2.269815] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

then the ID continues to increase.

What can I do to debug this issue? I don't want to use virsh destroy for risk of corrupting the filesystem. Should I try to backup the OpenAFS volume associated with the VM's storage pool? I can't get guestmount to work either with guestmount -m /dev/sda --ro -a /dev/opc-vg.shell1/shell1-disk /mnt because it says operation not permitted.

debian
xen
virsh
asked on Server Fault Feb 22, 2021 by qwr • edited Feb 22, 2021 by qwr

1 Answer

0

When I updated, my VM ran out of disk space. Then when I restarted, the kernel couldn't load init properly, likely due to the new initramfs not being created correctly for the new kernel. Fortunately, I could still either pause the VM with virsh suspend or kill the VM from restarting from crash (the default behavior in the XML) with virsh destroy, then mount the VM's volume with guestmount and copy files. It will be much easier to just create a new VM and rsync the files over than to try to recover the corrupted kernel, so that is what I will do.

answered on Server Fault Feb 23, 2021 by qwr

User contributions licensed under CC BY-SA 3.0