AWC EC2 Amazon Linux 2 Instances failed to boot after applying os updates

2

Yesterday we lost contact with 10 identically configured servers, after some investigation the conclusion was that a reboot after security updates had failed.

We have so far not been able to get any of the servers back online, but were lucky enough to be able to reinstall the instances without data loss.

I will paste the console log below, can anyone help me determine the root cause and perhaps give me some advice on if there is a better way to configure the server to make recovery easier (like getting past the "Press Enter to continue." prompt, that it seems to hang in).

The full log is too big for SO, so I put it on pastebin and pasted a redacted version below. I have removed the escape sequences that colorize the output and removed some double new lines, but besides that it is complete.

[    0.000000] Linux version 4.14.200-155.322.amzn2.x86_64 (mockbuild@ip-10-0-1-230) (gcc version 7.3.1 20180712 (Red Hat 7.3.1-10) (GCC)) #1 SMP Thu Oct 15 20:11:12 UTC 2020
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.14.200-155.322.amzn2.x86_64 root=UUID=a1e1011e-e38f-408e-878b-fed395b47ad6 ro console=tty0 console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 LANG=en_US.UTF-8
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[    0.000000] Hypervisor detected: KVM
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] e820: last_pfn = 0x3e3fa max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] RAMDISK: [mem 0x3433e000-0x36196fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F8F80 000014 (v00 AMAZON)
[    0.000000] ACPI: RSDT 0x000000003E3FE360 00003C (v01 AMAZON AMZNRSDT 00000001 AMZN 00000001)
[    0.000000] ACPI: FACS 0x000000003E3FFF40 000040
[    0.000000] ACPI: SSDT 0x000000003E3FF6C0 00087A (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] e820: [mem 0x40000000-0xdfffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.14.200-155.322.amzn2.x86_64 root=UUID=a1e1011e-e38f-408e-878b-fed395b47ad6 ro console=tty0 console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 LANG=en_US.UTF-8
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 943540K/1019488K available (10252K kernel code, 1958K rwdata, 2780K rodata, 2088K init, 4240K bss, 75948K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Kernel/User page tables isolation: enabled
[    0.000000] ftrace: allocating 26683 entries in 105 pages
[    0.004000] Hierarchical RCU implementation.
[    0.004000]    RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2.
[    0.004000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.004000] NR_IRQS: 524544, nr_irqs: 440, preallocated irqs: 16
[    0.004000] Console: colour VGA+ 80x25
[    0.004000] console [tty0] enabled
[    0.004000] console [ttyS0] enabled
[    0.004005] tsc: Detected 2500.000 MHz processor
[    0.007582] Calibrating delay loop (skipped) preset value.. 5000.00 BogoMIPS (lpj=10000000)
[    0.008002] pid_max: default: 32768 minimum: 301
[    0.012006] ACPI: Core revision 20170728
[    0.016560] ACPI: 2 ACPI AML tables successfully acquired and loaded
[    0.020015] Security Framework initialized
[    0.024002] SELinux:  Initializing.
[    0.028159] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.032082] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.036012] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.040006] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.044325] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.048003] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.052003] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.056003] Spectre V2 : Mitigation: Full generic retpoline
[    0.060002] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.064002] Speculative Store Bypass: Vulnerable
[    0.067720] TAA: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.068002] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.072086] Freeing SMP alternatives memory: 24K
[    0.076807] smpboot: Max logical packages: 1
[    0.080264] x2apic enabled
[    0.084003] Switched APIC routing to physical x2apic.
[    0.088000] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
[    0.088000] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x4)
[    0.088074] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
[    0.092046] Hierarchical SRCU implementation.
[    0.095857] NMI watchdog: Perf event create on CPU 0 failed with -2
[    0.096002] NMI watchdog: Perf NMI watchdog permanently disabled
[    0.100049] smp: Bringing up secondary CPUs ...
[    0.103696] x86: Booting SMP configuration:
[    0.104003] .... node  #0, CPUs:      #1
[    0.004000] kvm-clock: cpu 1, msr 0:3e357041, secondary cpu clock
[    0.106853] KVM setup async PF for cpu 1
[    0.107214] kvm-stealtime: cpu 1, msr 3e1161c0
[    0.112307] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.116006] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[    0.120007] smp: Brought up 1 node, 2 CPUs
[    0.123417] smpboot: Total of 2 processors activated (10000.00 BogoMIPS)
[    0.124320] devtmpfs: initialized
[    0.126970] x86/mm: Memory block size: 128MB
[    0.128137] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.132008] futex hash table entries: 512 (order: 3, 32768 bytes)
[    0.136156] NET: Registered protocol family 16
[    0.139769] cpuidle: using governor ladder
[    0.140013] cpuidle: using governor menu
[    0.143281] ACPI: bus type PCI registered
[    0.144000] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.148144] PCI: Using configuration type 1 for base access
[    0.156770] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.160017] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.164044] ACPI: Added _OSI(Module Device)
[    0.168007] ACPI: Added _OSI(Processor Device)
[    0.172007] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.176004] ACPI: Added _OSI(Processor Aggregator Device)
[    0.180007] ACPI: Interpreter enabled
[    0.184011] ACPI: (supports S0 S4 S5)
[    0.187094] ACPI: Using IOAPIC for interrupt routing
[    0.188018] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.300750] ACPI: Enabled 16 GPEs in block 00 to 0F
[    0.308023] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.312007] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.316010] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.320007] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.328324] acpiphp: Slot [3] registered
[    0.420040] acpiphp: Slot [31] registered
[    0.424003] PCI host bridge to bus 0000:00
[    0.536451] pci 0000:00:03.0: vgaarb: setting as boot VGA device
[    0.540000] pci 0000:00:03.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.548009] pci 0000:00:03.0: vgaarb: bridge control possible
[    0.551996] vgaarb: loaded
[    0.556090] EDAC MC: Ver: 3.0.0
[    0.559140] PCI: Using ACPI for IRQ routing
[    0.560280] NetLabel: Initializing
[    0.563268] NetLabel:  domain hash size = 128
[    0.568019] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.571902] NetLabel:  unlabeled traffic allowed by default
[    0.576145] clocksource: Switched to clocksource kvm-clock
[    0.586755] VFS: Disk quotas dquot_6.6.0
[    0.590090] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.594562] pnp: PnP ACPI init
[    0.597855] pnp: PnP ACPI: found 5 devices
[    0.608231] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.614881] NET: Registered protocol family 2
[    0.618324] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    0.622749] TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
[    0.626965] TCP: Hash tables configured (established 8192 bind 8192)
[    0.631170] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    0.635163] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    0.639358] NET: Registered protocol family 1
[    0.642779] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.646797] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.651113] pci 0000:00:03.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    0.657825] Unpacking initramfs...
[    0.734208] Freeing initrd memory: 31076K
[    0.737636] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x240939f1bb2, max_idle_ns: 440795263295 ns
[    0.745181] Scanning for low memory corruption every 60 seconds
[    0.750602] audit: initializing netlink subsys (disabled)
[    0.754606] audit: type=2000 audit(1603879247.564:1): state=initialized audit_enabled=0 res=1
[    0.754917] Initialise system trusted keyrings
[    0.764927] Key type blacklist registered
[    0.768266] workingset: timestamp_bits=36 max_order=18 bucket_order=0
[    0.773861] zbud: loaded
[    0.905903] Key type asymmetric registered
[    0.909292] Asymmetric key parser 'x509' registered
[    0.912915] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    0.918972] io scheduler noop registered (default)
[    0.922543] io scheduler cfq registered
[    0.925904] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[    0.964594] crc32c_combine: 8373 self tests passed
[    0.968628] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.000785] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    1.007649] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    1.014310] i8042: Warning: Keylock active
[    1.018572] serio: i8042 KBD port at 0x60,0x64 irq 1
[    1.022414] serio: i8042 AUX port at 0x60,0x64 irq 12
[    1.026284] rtc_cmos 00:00: RTC can wake from S4
[    1.030475] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    1.034755] rtc_cmos 00:00: alarms up to one day, 114 bytes nvram
[    1.038955] hidraw: raw HID events driver (C) Jiri Kosina
[    1.042936] NET: Registered protocol family 17
[    1.046622] mce: Using 32 MCE banks
[    1.049627] sched_clock: Marking stable (1049607566, 0)->(1755024155, -705416589)
[    1.056014] registered taskstats version 1
[    1.059279] Loading compiled-in X.509 certificates
[    1.064832] Loaded X.509 cert 'Build time autogenerated kernel key: 121ffea65ca15230f4a21fe7e5b65abaabaa433c'
[    1.072013] zswap: loaded using pool lzo/zbud
[    1.075526] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    1.079746] ima: Allocated hash algorithm: sha1
[    1.083589] rtc_cmos 00:00: setting system clock to 2020-10-28 09:59:31 UTC (1603879171)
[    1.091820] Freeing unused kernel memory: 2088K
[    1.116102] Write protecting the kernel read-only data: 16384k
[    1.120697] Freeing unused kernel memory: 2016K
[    1.126528] Freeing unused kernel memory: 1316K
[    1.160972] systemd[1]: Inserted module 'autofs4'
[    1.176133] NET: Registered protocol family 10
[    1.181508] Segment Routing with IPv6
[    1.184828] systemd[1]: Inserted module 'ipv6'
[    1.189116] random: systemd: uninitialized urandom read (16 bytes read)
[    1.193763] random: systemd: uninitialized urandom read (16 bytes read)
[    1.198171] random: systemd: uninitialized urandom read (16 bytes read)
[    1.205354] systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
[    1.217384] systemd[1]: Detected virtualization kvm.
[    1.221077] systemd[1]: Detected architecture x86-64.
[    1.224774] systemd[1]: Running in initial RAM disk.
Welcome to Amazon Linux 2 dracut-033-535.amzn2.1.3 (Initramfs)
[    1.230712] systemd[1]: No hostname configured.
[    1.234213] systemd[1]: Set hostname to <localhost>.
[    1.237934] systemd[1]: Initializing machine ID from KVM UUID.
[  OK  ] Reached target Swap.
[    1.265844] systemd[1]: Reached target Swap.
[    1.269312] systemd[1]: Starting Swap.
[  OK  ] Created slice Root Slice.
[    1.274036] systemd[1]: Created slice Root Slice.
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Timers.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Local File Systems.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Created slice System Slice.
         Starting Setup Virtual Console...
         Starting Journal Service...
         Starting Create list of required st... nodes for the current kernel...
         Starting Apply Kernel Variables...
[  OK  ] Reached target Slices.
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Reached target Sockets.
         Starting dracut cmdline hook...
[  OK  ] Started Setup Virtual Console.
[  OK  ] Started Create list of required sta...ce nodes for the current kernel.
[  OK  ] Started Apply Kernel Variables.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Started Journal Service.
[  OK  ] Started dracut cmdline hook.
         Starting dracut pre-udev hook...
[    1.390579] device-mapper: uevent: version 1.0.3
[    1.394255] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[  OK  ] Started dracut pre-udev hook.
         Starting udev Kernel Device Manager...
[  OK  ] Started udev Kernel Device Manager.
         Starting dracut pre-trigger hook...
[  OK  ] Started dracut pre-trigger hook.
         Starting udev Coldplug all Devices...
[  OK  ] Started udev Coldplug all Devices.
         Starting Show Plymouth Boot Screen...
[  OK  ] Reached target System Initialization.
         Starting dracut initqueue hook...
[    1.534629] nvme nvme0: pci function 0000:00:04.0
[  OK  ] Started Show Plymouth Boot Screen.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Basic System.
[    1.543815] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
[    1.546543] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    1.556607] nvme nvme1: pci function 0000:00:1f.0
[    1.557854] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
[    1.576394] AVX2 version of gcm_enc/dec engaged.
[    1.580503] AES CTR mode by8 optimization enabled
[    1.601321] alg: No test for pcbc(aes) (pcbc-aes-aesni)
[    1.776495]  nvme0n1: p1 p128
[    1.908576] random: fast init done
[  OK  ] Found device /dev/disk/by-uuid/a1e1011e-e38f-408e-878b-fed395b47ad6.
         Starting File System Check on /dev/...e-e38f-408e-878b-fed395b47ad6...
[  OK  ] Started File System Check on /dev/d...11e-e38f-408e-878b-fed395b47ad6.
[  OK  ] Started dracut initqueue hook.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting dracut pre-mount hook...
[  OK  ] Started dracut pre-mount hook.
         Mounting /sysroot...
[    2.235770] SGI XFS with ACLs, security attributes, no debug enabled
[    2.242333] XFS (nvme0n1p1): Mounting V5 Filesystem
[    4.142597] XFS (nvme0n1p1): Ending clean mount
[  OK  ] Mounted /sysroot.
[  OK  ] Reached target Initrd Root File System.
         Starting Reload Configuration from the Real Root...
[  OK  ] Started Reload Configuration from the Real Root.
[  OK  ] Reached target Initrd File Systems.
[  OK  ] Reached target Initrd Default Target.
         Starting dracut pre-pivot and cleanup hook...
[  OK  ] Started dracut pre-pivot and cleanup hook.
         Starting Cleaning Up and Shutting Down Daemons...
[  OK  ] Stopped Cleaning Up and Shutting Down Daemons.
[  OK  ] Stopped target Timers.
[  OK  ] Stopped dracut pre-pivot and cleanup hook.
         Stopping dracut pre-pivot and cleanup hook...
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
[  OK  ] Stopped target Initrd Default Target.
         Starting Plymouth switch root service...
[  OK  ] Stopped dracut pre-mount hook.
         Stopping dracut pre-mount hook...
[  OK  ] Stopped dracut initqueue hook.
         Stopping dracut initqueue hook...
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Sockets.
[  OK  ] Stopped target System Initialization.
[  OK  ] Stopped target Swap.
[  OK  ] Stopped target Local File Systems.
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Apply Kernel Variables...
[  OK  ] Stopped target Local Encrypted Volumes.
[  OK  ] Stopped udev Coldplug all Devices.
         Stopping udev Coldplug all Devices...
[  OK  ] Stopped dracut pre-trigger hook.
         Stopping dracut pre-trigger hook...
         Stopping udev Kernel Device Manager...
[  OK  ] Stopped target Slices.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped udev Kernel Device Manager.
[  OK  ] Stopped Create Static Device Nodes in /dev.
         Stopping Create Static Device Nodes in /dev...
[  OK  ] Stopped Create list of required sta...ce nodes for the current kernel.
         Stopping Create list of required st... nodes for the current kernel...
[  OK  ] Stopped dracut pre-udev hook.
         Stopping dracut pre-udev hook...
[  OK  ] Stopped dracut cmdline hook.
         Stopping dracut cmdline hook...
[  OK  ] Closed udev Kernel Socket.
[  OK  ] Closed udev Control Socket.
         Starting Cleanup udevd DB...
[  OK  ] Started Cleanup udevd DB.
[  OK  ] Reached target Switch Root.
[    4.553875] systemd-journald[667]: Received SIGTERM from PID 1 (systemd).
[  OK  ] Started Plymouth switch root service.
         Starting Switch Root...
[    4.885212] systemd: 30 output lines suppressed due to ratelimiting
[    5.925390] SELinux:  Disabled at runtime.
[    5.980115] audit: type=1404 audit(1603879176.396:2): selinux=0 auid=4294967295 ses=4294967295
[    6.083250] ip_tables: (C) 2000-2006 Netfilter Core Team
[    6.106470] systemd[1]: Inserted module 'ip_tables'

Welcome to Amazon Linux 2

[  OK  ] Stopped Switch Root.
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Reached target Swap.
[  OK  ] Listening on Delayed Shutdown Socket.
         Mounting Huge Pages File System...
[  OK  ] Stopped target Switch Root.
[  OK  ] Stopped target Initrd Root File System.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
[  OK  ] Created slice User and Session Slice.
         Starting Create list of required st... nodes for the current kernel...
[  OK  ] Listening on LVM2 poll daemon socket.
[  OK  ] Stopped target Initrd File Systems.
[  OK  ] Listening on udev Kernel Socket.
         Mounting Debug File System...
[  OK  ] Reached target Slices.
[  OK  ] Listening on LVM2 metadata daemon socket.
         Mounting POSIX Message Queue File System...
[  OK  ] Created slice system-selinux\x2dpol...grate\x2dlocal\x2dchanges.slice.
         Starting Monitoring of LVM2 mirrors... dmeventd or progress polling...
[  OK  ] Created slice system-serial\x2dgetty.slice.
         Starting Read and set NIS domainname from /etc/sysconfig/network...
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Set up automount Arbitrary Executab...ats File System Automount Point.
         Starting Remount Root and Kernel File Systems...
[  OK  ] Started Journal Service.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Started Create list of required sta...ce nodes for the current kernel.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Started Read and set NIS domainname from /etc/sysconfig/network.
         Starting udev Coldplug all Devices...
         Starting Configure read-only root support...
         Starting Relabel kernel modules early in the boot, if needed...
         Starting Create Static Device Nodes in /dev...
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Relabel kernel modules early in the boot, if needed.
         Starting Load Kernel Modules...
[    7.047237] systemd-journald[1398]: Received request to flush runtime journal from PID 1
[    7.069936] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.2.10g
[    7.084119] ena: ena device version: 0.10
[    7.089001] ena: ena controller version: 0.0.1 implementation version 1
[  OK  ] Started Configure read-only root support.
         Starting Load/Save Random Seed...
[  OK  ] Started Load/Save Random Seed.
[    7.156042] ena 0000:00:05.0: LLQ is not supported Fallback to host mode policy.
[  OK  ] Started udev Coldplug all Devices.
         Starting udev Wait for Complete Device Initialization...
[    7.181318] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at mem febf4000, mac addr 0a:cf:65:4e:dd:ff
[  OK  ] Started Load Kernel Modules.
         Starting Apply Kernel Variables...
[  OK  ] Started LVM2 metadata daemon.
         Starting LVM2 metadata daemon...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Create Static Device Nodes in /dev.
         Starting udev Kernel Device Manager...
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Found device /dev/ttyS0.
[    7.776329] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[    7.783413] ACPI: Power Button [PWRF]
[    7.786723] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input4
[    7.793032] ACPI: Sleep Button [SLPF]
         Starting Relabel kernel modules early in the boot, if needed...
[  OK  ] Created slice system-ec2net\x2difup.slice.
[  OK  ] Started Relabel kernel modules early in the boot, if needed.
[    7.888784] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input5
[    7.904661] mousedev: PS/2 mouse device common for all mice
[  OK  ] Started udev Wait for Complete Device Initialization.
         Starting Activation of DM RAID sets...
[  OK  ] Started Activation of DM RAID sets.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Started Monitoring of LVM2 mirrors,...ng dmeventd or progress polling.
[  OK  ] Reached target Local File Systems (Pre).
[   59.305661] random: crng init done
[   59.308921] random: 7 urandom warning(s) missed due to ratelimiting
[ TIME ] Timed out waiting for device dev-sdf.device.
[DEPEND] Dependency failed for /home/storage.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Mark the need to relabel after reboot.
[DEPEND] Dependency failed for Relabel all filesystems, if necessary.
[DEPEND] Dependency failed for Migrate local... structure to the new structure.
         Starting Preprocess NFS configuration...
[  OK  ] Reached target Timers.
[  OK  ] Reached target Network (Pre).
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Cloud-init target.
         Starting Initial hibernation setup job...
         Starting Initial cloud-init job (metadata service crawler)...
[  OK  ] Reached target Network.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Sockets.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Emergency Shell.
         Starting Emergency Shell...
[  OK  ] Reached target Emergency Mode.
         Starting Tell Plymouth To Write Out Runtime Data...
[  OK  ] Started Preprocess NFS configuration.
[  OK  ] Started Create Volatile Files and Directories.
         Mounting RPC Pipe File System...
         Starting Security Auditing Service...
         Starting RPC bind service...
[   97.160193] RPC: Registered named UNIX socket transport module.
[   97.160194] RPC: Registered udp transport module.
[   97.160194] RPC: Registered tcp transport module.
[   97.160195] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  OK  ] Mounted RPC Pipe File System.
[  OK  ] Reached target rpc_pipefs.target.
[  OK  ] Reached target NFS client services.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started RPC bind service.
[  OK  ] Started Security Auditing Service.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.
[   99.871085] hibinit-agent[1855]: Traceback (most recent call last):
[   99.871339] hibinit-agent[1855]: File "/usr/bin/hibinit-agent", line 496, in <module>
[   99.871592] hibinit-agent[1855]: main()
[   99.872080] hibinit-agent[1855]: File "/usr/bin/hibinit-agent", line 435, in main
[   99.872516] hibinit-agent[1855]: if not hibernation_enabled(config.state_dir):
[   99.873017] hibinit-agent[1855]: File "/usr/bin/hibinit-agent", line 390, in hibernation_enabled
[   99.873487] hibinit-agent[1855]: imds_token = get_imds_token()
[   99.873793] hibinit-agent[1855]: File "/usr/bin/hibinit-agent", line 365, in get_imds_token
[   99.875332] hibinit-agent[1855]: response = requests.put(token_url, headers=request_header)
[   99.877065] hibinit-agent[1855]: File "/usr/lib/python2.7/site-packages/requests/api.py", line 121, in put
[   99.877230] hibinit-agent[1855]: return request('put', url, data=data, **kwargs)
[   99.877959] hibinit-agent[1855]: File "/usr/lib/python2.7/site-packages/requests/api.py", line 50, in request
[   99.878225] hibinit-agent[1855]: response = session.request(method=method, url=url, **kwargs)
[   99.878614] hibinit-agent[1855]: File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 486, in request
[   99.879747] hibinit-agent[1855]: resp = self.send(prep, **send_kwargs)
[   99.880157] hibinit-agent[1855]: File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 598, in send
[   99.884411] hibinit-agent[1855]: r = adapter.send(request, **kwargs)
[   99.884728] hibinit-agent[1855]: File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 419, in send
[   99.892094] hibinit-agent[1855]: raise ConnectTimeout(e, request=request)
[   99.892377] hibinit-agent[1855]: requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/api/token (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7efc029fa390>: Failed to establish a new connection: [Errno 101] Network is unreachable',))
[FAILED] Failed to start Initial hibernation setup job.
See 'systemctl status hibinit-agent.service' for details.
[  101.215791] cloud-init[1856]: Cloud-init v. 19.3-3.amzn2 running 'init' at Wed, 28 Oct 2020 10:01:11 +0000. Up 101.18 seconds.
[  101.264707] cloud-init[1856]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
[  101.264940] cloud-init[1856]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[  101.272469] cloud-init[1856]: ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
[  101.274166] cloud-init[1856]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[  101.274497] cloud-init[1856]: ci-info: |  eth0  | False |     .     |     .     |   .   | 0a:cf:65:4e:dd:ff |
[  101.284890] cloud-init[1856]: ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
[  101.286727] cloud-init[1856]: ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
[  101.286986] cloud-init[1856]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[  101.291933] cloud-init[1856]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[  101.292215] cloud-init[1856]: ci-info: +-------+-------------+---------+-----------+-------+
[  101.294122] cloud-init[1856]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[  101.294383] cloud-init[1856]: ci-info: +-------+-------------+---------+-----------+-------+
[  101.294543] cloud-init[1856]: ci-info: +-------+-------------+---------+-----------+-------+
Welcome to emerg
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.
Press Enter to continue.
amazon-web-services
amazon-ec2
crash-reports
amazon-linux-2
asked on Stack Overflow Oct 29, 2020 by Jens Møller • edited Oct 30, 2020 by Jens Møller

2 Answers

2

I think I've narrowed this down to the ec2-utils package. We had the same issue, related to devices not mounting properly that we initially thought was related to the ENA or NVMe driver. Once we ran a yum update, it was resolved.

If you downgrade the ec2-utils package to ec2-utils-1.2-2.amzn2 the issue returns. This seems to only affect nitro based instances. To fix it, you can temporarily boot as a t2 or other older instance type and update the package.

answered on Stack Overflow Oct 29, 2020 by faceinthecrowd
1

Ok, shortly after posting we figured it out. Seems like a mount point has changed (I expect due to a linux kernel update) and we have not used the nofail option in /etc/fstab as described in the aws knowledge center, this caused the server to hang at boot.

Going forward we will also ensure we use UUID mounting so we are independent on the device naming in /dev/.

answered on Stack Overflow Oct 29, 2020 by Jens Møller • edited Oct 30, 2020 by Jens Møller

User contributions licensed under CC BY-SA 3.0