What's the cause of these crashing imapd processes?

Question

What's the cause of these crashing imapd processes?

We have a Mac OS X 10.5 Leopard Server mail server that recently started getting issues with IMAP mailboxes having "invalid format[s]" this weekend. It turned out that there were some bad block counts on the volume housing IMAP data and the issue has not resurfaced after repairing the volume and the afflicted mailboxes. However, a new issue that is persisting is frequently crashing imaps processes and ever increasing db4 "lockers" errors like the following:

Apr 13 17:01:12 host lmtpunix[31509]: DBERROR db4: 1134 lockers

The errors for the crashing imaps processes from /var/log/system.log are as follows:

Apr 12 13:43:10 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:12 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:13 host imaps[11792]: login: pool-72-92-XXX-XXX.burl.east.myfairpoint.net [72.92.XXX.XXX] user3 CRAM-MD5+TLS User logged in
Apr 12 13:43:15 host ReportCrash[14362]: Formulating crash report for process imapd[11792]
Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11
Apr 12 13:43:15 host ReportCrash[14362]: Saved crashreport to /Library/Logs/CrashReporter/imapd_2011-04-12-134315_host.crash using uid: 0 gid: 0, euid: 0 egid: 0

And the following from /var/log/mailaccess.log:

Apr 12 13:43:10 host imaps[11792]: accepted connection
Apr 12 13:43:10 host imaps[11792]: mydelete: starting txn 2147495107
Apr 12 13:43:10 host imaps[11792]: mydelete: committing txn 2147495107
Apr 12 13:43:10 host imaps[11792]: mystore: starting txn 2147495108
Apr 12 13:43:10 host imaps[11792]: mystore: committing txn 2147495108
Apr 12 13:43:10 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:12 host imaps[11792]: accepted connection
Apr 12 13:43:12 host imaps[11792]: mydelete: starting txn 2147495112
Apr 12 13:43:12 host imaps[11792]: mydelete: committing txn 2147495112
Apr 12 13:43:12 host imaps[11792]: mystore: starting txn 2147495113
Apr 12 13:43:12 host imaps[11792]: mystore: committing txn 2147495113
Apr 12 13:43:12 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:12 host imaps[11792]: AOD: user options: no lookup required for: user3
Apr 12 13:43:13 host imaps[11792]: login: pool-72-92-XXX-XXX.burl.east.myfairpoint.net [72.92.149.161] user3 CRAM-MD5+TLS User logged in
Apr 12 13:43:13 host imaps[11792]: quota set to "unlimited" for mailbox user.user3
Apr 12 13:43:13 host imaps[11792]: open: user user3 opened Other Users/listmaster
Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11
Apr 12 13:43:15 host master[94896]: service imaps pid 11792 in BUSY state: terminated abnormally
Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11
Apr 12 13:43:15 host master[94896]: service imaps pid 11792 in BUSY state: terminated abnormally

The crash reports are all like the following:

Process:         imapd [39069]
Path:            /usr/bin/cyrus/bin/imapd
Identifier:      imapd
Version:         ??? (???)
Code Type:       X86 (Native)
Parent Process:  master [38605]

Date/Time:       2011-04-13 18:25:24.068 -0400
OS Version:      Mac OS X Server 10.5.7 (9J61)
Report Version:  6
Anonymous UUID:  223C4DD1-2AE2-4381-8A28-DEB9082281A8

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000077a0ca64
Crashed Thread:  0

Thread 0 Crashed:
0   imapd                               0x0003090c process_records + 588
1   imapd                               0x00031362 mailbox_expunge + 2146
2   imapd                               0x00006fde cmd_close + 179
3   imapd                               0x00018cf8 cmdloop + 2940
4   imapd                               0x0001c1b7 service_main + 1498
5   imapd                               0x00002e73 main + 3502
6   imapd                               0x00002006 start + 54

Thread 0 crashed with X86 Thread State (32-bit):
  eax: 0x61766970  ebx: 0x000306cb  ecx: 0x00000008  edx: 0x77a0ca64
  edi: 0x00bfffa4  esi: 0x162a5fa4  ebp: 0xbfffad48  esp: 0xbfffac90
   ss: 0x0000001f  efl: 0x00010202  eip: 0x0003090c   cs: 0x00000017
   ds: 0x0000001f   es: 0x0000001f   fs: 0x00000000   gs: 0x00000037
  cr2: 0x77a0ca64

Yes, they all crash in process_records in mailbox_expunge.

I'm not really seeing any other errors in the logs, at least that seem to be related to the crashed processes in any way or are innocuous like SQUAT failed to open index file and IOERROR: fstating sieve script /usr/sieve/u/user/defaultbc: No such file or directory.

I must admit, I have not rebuilt the Other Users/listmaster mailbox nor the user3 mailbox yet. It's not always the same user.

We do have some users that have found that sent email is not getting saved to their 'Sent Messages' mailbox and hasn't been since the date of the original issue. Rebuilding their mailbox (currently using sudo mailbfr -m username as it does some extra permissions fixed in addition to the sudo /usr/bin/cyrus/bin/reconstruct -r user/username that I'd normally run) seems to allow newly sent email to be saved to it, but I'm having trouble finding a correlation between that issue and this one (or any other errors in the logs).

Any suggestions would be greatly appreciated. Is is really crashing trying to delete messages? Should I just rebuild all users' mailboxes individually? I really do not want to rebuild the Cyrus database in its entirety and lose all flagged/read status for all messages.

mac-osx-server

imap

cyrus

asked on Server Fault Apr 13, 2011 by

morgant

2 Answers

I believe that corrupted blocks went to incorrect db indexes which are causing crashes while storing new data. Except of rebuilding database there is not much you can do. You can backup users .seen files and try to use them, but test this idea on test user. Honestly I think that harrdrive with bad blocks should be removed from the server as soon as possible anyway

answered on Server Fault Apr 15, 2011 by

BobC

I solved this issue long ago.

I don't remember the exact commands, but I found a way to reasonably correlate a specific crash to a specific user, at which point I could then run mailbfr -m to rebuild that user's mailbox. Eventually I was able to rebuild all of the problem mailboxes and rid the server of the issue.

answered on Server Fault Jul 13, 2011 by

morgant

User contributions licensed under CC BY-SA 3.0