Why does 'aptitude safe-upgrade' cause Ubuntu's boot to hang on "Waiting for root file system"?

3

I launched an Ubuntu EBS instance on Amazon EC2 using the Ubuntu's very own latest AMI for 10.04 Lucid, ami-ad36fbc4`

After getting the instance up, I ran the command sudo aptitude safe-upgrade which seems to have upgraded the kernel from vmlinuz-2.6.32-318-ec2 to vmlinuz-2.6.32-340-ec2

Now the instance won't boot, it gives the following error: Waiting for root file system ...

If I detach the EBS and edit the /boot/grub/menu.lst file and remove the entries referencing vmlinuz-2.6.32-340-ec2 it will boot again.

So the questions are:

  1. Why is this happening?
  2. Isn't safe-upgrade supposed to be conservative enough not to break things?
  3. Or should I just not be using safe-upgrade on an EC2 instance? And if so why not?

ps: A related issue I read up on while researching this was System boot hangs on Waiting for root file system - Procedure to recover from /dev/hda that became /dev/sda (see section 4.8), but as you can see from the menu.lst the entries are referred to by LABEL=cloudimg-rootfs and not /sda/a and /hda/a

For reference, the grub menu file is as follows:

title       Ubuntu 10.04.3 LTS, kernel 2.6.32-340-ec2
root        (hd0)
kernel      /boot/vmlinuz-2.6.32-340-ec2 root=LABEL=cloudimg-rootfs ro xencons=hvc0 console=hvc0 
initrd      /boot/initrd.img-2.6.32-340-ec2

title       Ubuntu 10.04.3 LTS, kernel 2.6.32-340-ec2 (recovery mode)
root        (hd0)
kernel      /boot/vmlinuz-2.6.32-340-ec2 root=LABEL=cloudimg-rootfs ro  single
initrd      /boot/initrd.img-2.6.32-340-ec2

title       Ubuntu 10.04.3 LTS, kernel 2.6.32-318-ec2
root        (hd0)
kernel      /boot/vmlinuz-2.6.32-318-ec2 root=LABEL=cloudimg-rootfs ro xencons=hvc0 console=hvc0 
initrd      /boot/initrd.img-2.6.32-318-ec2

title       Ubuntu 10.04.3 LTS, kernel 2.6.32-318-ec2 (recovery mode)
root        (hd0)
kernel      /boot/vmlinuz-2.6.32-318-ec2 root=LABEL=cloudimg-rootfs ro  single
initrd      /boot/initrd.img-2.6.32-318-ec2

title       Ubuntu 10.04.3 LTS, memtest86+
root        (hd0)
kernel      /boot/memtest86+.bin

And the boot console looks like this (when it hangs):

i-3121e5b7
2011-11-27T19:20:03+0000
Xen Minimal OS!
  start_info: 0xac4000(VA)
    nr_pages: 0x26700
  shared_inf: 0xbb4b2000(MA)
     pt_base: 0xac7000(VA)
nr_pt_frames: 0x9
    mfn_list: 0x990000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda1 ro 4
  stack:      0x94f860-0x96f860
MM: Init
      _text: 0x0(VA)
     _etext: 0x5ff6d(VA)
   _erodata: 0x78000(VA)
     _edata: 0x80b00(VA)
stack start: 0x94f860(VA)
       _end: 0x98fe68(VA)
  start_pfn: ad3
    max_pfn: 26700
Mapping memory range 0xc00000 - 0x26700000
setting 0x0-0x78000 readonly
skipped 0x1000
MM: Initialise page allocator for c01000(c01000)-26700000(26700000)
MM: done
Demand map pfns at 26701000-2026701000.
Heap resides at 2026702000-4026702000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x26701000.
Initialising scheduler
Thread "Idle": pointer: 0x2026702010, stack: 0x26640000
Initialising xenbus
Thread "xenstore": pointer: 0x20267027c0, stack: 0x26650000
Dummy main: start_info=0x96f960
Thread "main": pointer: 0x2026702f70, stack: 0x26660000
"main" "root=/dev/sda1" "ro" "4" 
vbd 2049 is hd0
******************* BLKFRONT for device/vbd/2049 **********


backend at /local/domain/0/backend/vbd/526/2049
Failed to read /local/domain/0/backend/vbd/526/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/526/2049/feature-flush-cache.
16777216 sectors of 512 bytes
**************************
[H[J  Booting 'Ubuntu 10.04.3 LTS, kernel 2.6.32-340-ec2'

root  (hd0)
 Filesystem type is ext2fs, using whole disk
kernel  /boot/vmlinuz-2.6.32-340-ec2 root=LABEL=cloudimg-rootfs ro xencons=hvc0
 console=hvc0 
initrd  /boot/initrd.img-2.6.32-340-ec2

xc_dom_probe_bzimage_kernel: kernel is not a bzImage
close blk: backend at /local/domain/0/backend/vbd/526/2049
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.32-340-ec2 (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #40-Ubuntu SMP Wed Nov 16 14:36:38 UTC 2011 (Ubuntu 2.6.32-340.40-ec2 2.6.32.46+drm33.20)
[    0.000000] Command line: root=LABEL=cloudimg-rootfs ro xencons=hvc0 console=hvc0 
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] Xen-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 0000000026f00000 (usable)
[    0.000000] last_pfn = 0x26f00 max_arch_pfn = 0x80000000
[    0.000000] init_memory_mapping: 0000000000000000-0000000026f00000
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] RAMDISK: 01844000 - 03293000
[    0.000000] (3 early reservations) ==> bootmem [0000000000 - 0026700000]
[    0.000000]   #0 [0001844000 - 00033e9000]     Xen provided ==> [0001844000 - 00033e9000]
[    0.000000]   #1 [0001000000 - 00018237b8]    TEXT DATA BSS ==> [0001000000 - 00018237b8]
[    0.000000]   #2 [00033e9000 - 0003523000]          PGTABLE ==> [00033e9000 - 0003523000]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00026700
[    0.000000]     0: 0x00026f00 -> 0x00026f00
[    0.000000] NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 18 pages/cpu @ffff880003298000 s44248 r8192 d21288 u73728
[    0.000000] pcpu-alloc: s44248 r8192 d21288 u73728 alloc=18*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 155259
[    0.000000] Kernel command line: root=LABEL=cloudimg-rootfs ro xencons=hvc0 console=hvc0 
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] allocated 6379520 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] Software IO TLB disabled
[    0.000000] Memory: 574464k/637952k available (4836k kernel code, 8192k absent, 54588k reserved, 2084k data, 228k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:96
[    0.000000] Xen reported: 2666.760 MHz processor.
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [hvc0] enabled
[    0.230003] Calibrating delay using timer specific routine.. 5347.09 BogoMIPS (lpj=26735464)
[    0.230055] Security Framework initialized
[    0.230073] AppArmor: AppArmor initialized
[    0.230089] Mount-cache hash table entries: 256
[    0.230209] Initializing cgroup subsys ns
[    0.230215] Initializing cgroup subsys cpuacct
[    0.230218] Initializing cgroup subsys memory
[    0.230228] Initializing cgroup subsys devices
[    0.230230] Initializing cgroup subsys freezer
[    0.230259] CPU: L1 I cache: 32K, L1 D cache: 32K
[    0.230262] CPU: L2 cache: 6144K
[    0.230271] SMP alternatives: switching to UP code
[    0.255645] Freeing SMP alternatives: 39k freed
[    0.255834] Brought up 1 CPUs
[    0.255922] devtmpfs: initialized
[    0.256333] NET: Registered protocol family 16
[    0.256945] Brought up 1 CPUs
[    0.257349] PCI: Fatal: No config space access function found
[    0.257353] PCI: setting up Xen PCI frontend stub
[    0.257605] bio: create slab <bio-0> at 0
[    0.257681] vgaarb: loaded
[    0.257889] suspend: event channel 9
[    0.258172] xen_mem: Initialising balloon driver.
[    0.260364] PCI: System does not support PCI
[    0.260368] PCI: System does not support PCI
[    0.260432] NET: Registered protocol family 8
[    0.260435] NET: Registered protocol family 20
[    0.260451] NetLabel: Initializing
[    0.260455] NetLabel:  domain hash size = 128
[    0.260456] NetLabel:  protocols = UNLABELED CIPSOv4
[    0.260490] NetLabel:  unlabeled traffic allowed by default
[    0.260505] Switching to clocksource xen
[    0.261840] AppArmor: AppArmor Filesystem Enabled
[    0.262007] NET: Registered protocol family 2
[    0.262083] IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.262363] TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
[    0.263136] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.263553] TCP: Hash tables configured (established 131072 bind 65536)
[    0.263559] TCP reno registered
[    0.263629] NET: Registered protocol family 1
[    0.263708] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.263814] audit: initializing netlink socket (disabled)
[    0.263838] type=2000 audit(1322421419.386:1): initialized
[    0.269569] Trying to unpack rootfs image as initramfs...
[    0.279699] VFS: Disk quotas dquot_6.5.2
[    0.279731] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.279885] DLM (built Nov 16 2011 14:40:41) installed
[    0.279994] JFS: nTxBlock = 4920, nTxLock = 39360
[    0.289416] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[    0.289643] SGI XFS Quota Management subsystem
[    0.299611] Slow work thread pool: Starting up
[    0.299651] Slow work thread pool: Ready
[    0.299659] GFS2 (built Nov 16 2011 14:41:38) installed
[    0.299675] msgmni has been set to 1230
[    0.299847] alg: No test for stdrng (krng)
[    0.299858] io scheduler noop registered
[    0.299860] io scheduler anticipatory registered
[    0.299862] io scheduler deadline registered (default)
[    0.299871] io scheduler cfq registered
[    0.314987] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.315818] brd: module loaded
[    0.316148] loop: module loaded
[    0.316216] Xen virtual console successfully installed as hvc0
[    0.316254] Event-channel device installed.
[    0.324444] Freeing initrd memory: 26940k freed
[    0.338978] netfront: Initialising virtual ethernet driver.
[    0.340057] PPP generic driver version 2.4.2
[    0.340628] Equalizer2002: Simon Janes (simon@ncm.com) and David S. Miller (davem@redhat.com)
[    0.340767] tun: Universal TUN/TAP device driver, 1.6
[    0.340769] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    0.341644] i8042.c: No controller found.
[    0.341704] mice: PS/2 mouse device common for all mice
[    0.341758] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[    0.341810] Driver for 1-wire Dallas network protocol.
[    0.341865] device-mapper: uevent: version 1.0.3
[    0.341932] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
[    0.342186] NET: Registered protocol family 17
[    0.342285] registered taskstats version 1
[    0.355601] xen-vbd: registered block device major 8
[    0.440415] XENBUS: Device with no driver: device/console/0
[    0.440429] /build/buildd/linux-ec2-2.6.32/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    0.440534] Freeing unused kernel memory: 228k freed
[    0.440675] Write protecting the kernel read-only data: 6492k
Loading, please wait...
[    0.460565] udev: starting version 151
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Waiting for root file system ... 
ubuntu
amazon-ec2
apt
grub
amazon-ami
asked on Server Fault Nov 27, 2011 by cwd • edited Nov 27, 2011 by cwd

1 Answer

1

I'm sorry this response is so delayed. A few comments first:

  • In the future, if you find issues with Ubuntu running on Amazon's EC2, the best method for getting the issue resolved is to open a bug in launchpad (http://launchpad.net/ubuntu). You can run 'ubuntu-bug' inside an EC2 instance and it will collect some information about the instance and tag the bug appropriately. Also, feel free to subscribe 'smoser' or 'utlemming'.
  • The ami you list is no longer current (simply due to time passing, and Ubuntu refreshing images on EC2). If you're interested in finding the most current official AMIs, please see
       https://askubuntu.com/questions/53582/how-do-i-know-what-ubuntu-ami-to-launch-on-ec2

  • The kernel you were running is no longer current for 10.04 (again, simply due to maintenance on Ubuntu).

So, all that said, running aptitude safe-upgrade should be safe on EC2. I verified that doing so works when using the AMI you listed above on both an t1.micro and a m1.large. At this point in time, that results in kernel '2.6.32-341.42' rather than what you got '2.6.32-340.40'.

I tried to reproduce your issue explicitly by downloading and installing the same version of the kernel via the launchpad archive. My instance of both t1.micro and m1.large rebooted into 2.6.32-340 after a simple sudo dpkg -i linux-image-2.6.32-340-ec2_2.6.32-340.40_amd64.deb && sudo reboot.

Again, aptitude safe-upgrade and apt-get dist-upgrade should be perfectly safe on EC2. If they're not, please open bugs.

answered on Server Fault Jan 17, 2012 by smoser • edited Apr 13, 2017 by Community

User contributions licensed under CC BY-SA 3.0