"zpool create" hangs indefinitely

3

I have a Solaris 11.1 machine which has some disks attached to it via an expander (LSISAS2X36) through an LSI 1068 controller. The setup used to work quite decently, but as I have added another batch of disks, I see some strange effects:

  • format hangs after selecting a disk (any disk) if I do not specify NOINUSE_CHECK=1
  • I cannot create new pools, a simple zpool create test c10d20t0 will hang - seemingly due to the same reason as format does. The NOINUSE_CHECK variable seems to have no effect, although old news archives seem to suggest that it seemed to help for previous releases of Solaris.

I already tried running devfsadm -Cv to clean up dev entries for non-present devices, but to no avail. I also figured that invalid partition information on one of the newly added disks might cause the "in use" check to hang and ran the fdisk menu for all of the added disks to create a 100% Solaris partition, but this did not help things either.

A truss zpool create test c10t20d0 reveals a lot of reading links off /dev/rdsk/ and stops with these lines:

readlink("/dev/zvol/rdsk/rpool/dump", "../../../..//devices/pseudo/zfs@0:1,raw", 1023) = 39
lstat("/dev", 0xF8D35310)                       = 0
lstat("/dev/zvol", 0xF8D35310)                  = 0
lstat("/dev/zvol/rdsk", 0xF8D35310)             = 0
lstat("/dev/zvol/rdsk/rpool", 0xF8D35310)       = 0
lstat("/dev/zvol/rdsk/rpool/swap", 0xF8D35310)  = 0
readlink("/dev/zvol/rdsk/rpool/swap", "../../../..//devices/pseudo/zfs@0:2,raw", 1023) = 39
open("/devices/pseudo/devinfo@0:devinfo", O_RDONLY) = 7
ioctl(7, DINFOIDENT, 0x00000000)                = 57311
ioctl(7, 0x10DF00, 0xF8D36F10)                  = 380014
ioctl(7, DINFOUSRLD, 0x08D62000)                = 380928
close(7)                                        = 0
close(6)                                        = 0
munmap(0xF5FE1000, 4096)                        = 0
munmap(0xF5FD2000, 20480)                       = 0
munmap(0xF5FC7000, 24576)                       = 0
munmap(0xF6014000, 110592)                      = 0
munmap(0xF6030000, 40)                          = 0
close(5)                                        = 0
stat64("/opt/VRTSvxvm/lib/libsysevent.so.1", 0xF8D36910) Err#2 ENOENT
stat64("/lib/libsysevent.so.1", 0xF8D36910)     = 0
resolvepath("/lib/libsysevent.so.1", "/lib/libsysevent.so.1", 1023) = 21
open("/lib/libsysevent.so.1", O_RDONLY)         = 5
mmapobj(5, MMOBJ_INTERPRET, 0xF6040B78, 0xF8D3697C, 0x00000000) = 0
close(5)                                        = 0
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xF5FE0000
memcntl(0xF6020000, 11280, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
getuid()                                        = 0 [0]
statvfs("/system/volatile", 0xF8D369B0)         = 0
stat("/system/volatile/sysevent_channels", 0xF8D36A50) = 0
mkdir("/system/volatile/sysevent_channels/syseventd_channel", 0755) Err#17 EEXIST
stat("/system/volatile/sysevent_channels/syseventd_channel", 0xF8D368F0) = 0
getuid()                                        = 0 [0]
modctl(MODEVENTS, 0x00000006, 0x08D560EB, 0x00000000, 0xF8D36880) = 0
modctl(MODEVENTS, 0x00000006, 0x08D560EB, 0x00000000, 0xF8D36A40) = 0
unlink("/system/volatile/sysevent_channels/syseventd_channel/59") Err#2 ENOENT
open("/system/volatile/sysevent_channels/syseventd_channel/59", O_RDWR|O_CREAT, 0600) = 5
door_create(0xF6024174, 0x08D56088, DOOR_REFUSE_DESC|DOOR_NO_CANCEL) = 6
getpid()                                        = 22082 [22081]
priocntlsys(1, 0xF8D365B0, 3, 0xF8D366A0, 0)    = 22082
priocntlsys(1, 0xF8D36540, 1, 0xF8D36600, 0)    = 4
priocntlsys(1, 0xF8D36500, 0, 0xF6575FB8, 0)    = 4
mmap(0x00000000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xF5FBF000
mmap(0x00000000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xF5FA0000
sigaction(SIGCANCEL, 0xF8D366C0, 0x00000000)    = 0
sysconfig(_CONFIG_STACK_PROT)                   = 3
mmap(0x00000000, 1040384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON, -1, 0) = 0xF5EA1000
mmap(0x00010000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xF5E90000
getcontext(0xF8D36510)
uucopy(0xF8D364D0, 0xF5F9EFEC, 20)              = 0
lwp_create(0xF8D36760, LWP_DETACHED|LWP_SUSPENDED, 0xF8D3675C) = 2
/1:     lwp_continue(2)                                 = 0
/2:     lwp_create()    (returning as new lwp ...)      = 0
/1:     yield()                                         = 0
/2:     setustack(0xF5E902A0)
/2:     schedctl()                                      = 0xF623B040
/1:     umount2("/system/volatile/sysevent_channels/syseventd_channel/59", 0x00000000) Err#22 EINVAL
/1:     ioctl(6, I_CANPUT, 0x00000000)                  Err#89 ENOSYS
/1:     door_info(6, 0xF8D36640)                        = 0
/1:     mount(0, "/system/volatile/sysevent_channels/syseventd_channel/59", MS_DATA|MS_NOMNTTAB, "namefs", 0xF8D3663C, 4, 0x00000000, 0) = 0
/1:     close(5)                                        = 0
/1:     open("/system/volatile/sysevent_channels/syseventd_channel/reg_door", O_RDONLY) = 5
/2:     door_return(0x00000000, 0, 0x00000000, 0xF5F9EE00, 1007360) (sleeping...)
/1:     door_call(5, 0xF8D369F0)        (sleeping...)
^C/1:       Received signal #2, SIGINT, in door_call() [default]

a truss format c10t20d0 looks pretty much the same towards the end.

Anything else I could do to narrow down the possible causes or just try and see if it would work?

solaris
zfs
asked on Server Fault Feb 1, 2014 by the-wabbit • edited Feb 2, 2014 by the-wabbit

1 Answer

2

It looks like the system did not handle a pulled disk very well. Although most of it seemed to work correctly, the format and zpool create commands hung even after the missing disk has been re-inserted.

Rebooting the system helped matters - a fast reboot was sufficient.

answered on Server Fault Feb 24, 2014 by the-wabbit

User contributions licensed under CC BY-SA 3.0