Domino 10 Java XPage crashing the server after some time with PANIC: semaphore invalid or not allocated

1

I have a java XPages domino application that's running on my server and serves as an API for handling the Rooms & Resoruces database remotely (the main role is obtaining reservations for a set of rooms and updating it periodically).

Everything was fine when testing, but once I put the app on my production server, I got a crash after some time:

Domino version:     Release 10.0.1FP3 August 09, 2019
OS Version:     Windows/2016 10.0 [64-bit]
Error Message = PANIC: semaphore invalid or not allocated
    SharedDPoolSize = 33554432
    FaultRecovery = 0x00010012
    Cleanup Script Timeout= 600
    Crash Limits = 3 crashes in 5 minutes
    StaticHang = Virtual Thread [   nHTTP:  0674:  0011] (Native thread [   nHTTP:  0674:  145c]) (0x674/0x11/0x170000145C)
    ConfigFileSem =  (  SEM:#0:0x1520000010D) n=0, wcnt=-1, Users=-1,  Owner=[        :  0000]
    FDSem         =  ( RWSEM:#53:0x410f) rdcnt=-1, refcnt=0 Writer=[        :  0000], n=53, wcnt=-1, Users=0,  Owner=[        :  0000]

<@@ ------ Notes Data -> OS Data -> Semaphores -> SEM Info (Time 10:34:34) ------ @@>

     SpinLockIterations   = 1500
     FirstFreeSem         = 819
     SemTableSize         = 827
############################################################
### thread 46/89: [   nHTTP:  0674:  145c] FATAL THREAD (Panic)
### FP=0xF3AC3F61E8, PC=0x7FFFC6DD5AC4, SP=0xF3AC3F61E8
### stkbase=0xF3AC400000, total stksize=1048576, used stksize=40472
### EAX=0x00000004, EBX=0x00000000, ECX=0x00001c6c, EDX=0x00000000
### ESI=0x000927c0, EDI=0x00001c6c, CS=0x00000033, SS=0x0000002b
### DS=0x00000000, ES=0x00000000, FS=0x00000000, GS=0x00000000 Flags=0x1700000246
############################################################
 [ 1] 0x7FFFC6DD5AC4 ntdll.ZwWaitForSingleObject+20 (10,0,0,F3AC3F6300)
 [ 2] 0x7FFFC3464ABF KERNELBASE.WaitForSingleObjectEx+143 (10,F3AC3F69B0,7FFF00000000,1c6c)
@[ 3] 0x7FFFB326DAD0 nnotes.OSRunExternalScript+1808 (5,0,424,0)
@[ 4] 0x7FFFB3269E9C nnotes.FRTerminateWindowsResources+1532 (5,23B45D80D50,0,1)
@[ 5] 0x7FFFB326BA23 nnotes.OSFaultCleanupExt+1395 (0,7f60,0,F3AC3F7C70)
@[ 6] 0x7FFFB326B4A7 nnotes.OSFaultCleanup+23 (7f60,7FFFB3DE7E30,0,200000000)
@[ 7] 0x7FFFB32D6D76 nnotes.OSNTUnhandledExceptionFilter+390 (F3AC3F7B50,7FFFB485A818,F3AC3F7C70,FFFFEB865BDB003)
@[ 8] 0x7FFFB326E70A nnotes.Panic+1066 (5dc,125851500347E41,7FF786A7B4A0,23B1D91F9A8)
@[ 9] 0x7FFFB329FDD6 nnotes.OSLockSemInt+70 (23B1D91F9A4,145c,7FF786A84578,7FF786A84578)
@[10] 0x7FFFB32A04ED nnotes.OSLockWriteSem+77 (23B1D92AA18,7FF786A84578,23B14EA41B0,7FF786A84578)
@[11] 0x7FFFAC74DDC1 nlsxbe.ANDatabase::ANDRemoveCalendar+33 (23B1D92AA18,7FF786A84578,0,23B18FFCBA8)
@[12] 0x7FFFAC881CBB nlsxbe.ANCalendar::`scalar deleting destructor'+91 (7FF786A84578,23B1BB6FC78,0,1)
@[13] 0x7FFFAC7FFAF7 nlsxbe.Java_lotus_domino_local_NotesBase_RecycleObject2+471 (23B159C7A00,23B1BB6FC78,23B1BB6FC70,0)
@[14] 0x7FFFAC7FF91A nlsxbe.Java_lotus_domino_local_NotesBase_RecycleObject+42 (23B159C7A00,23B1BB6FC78,23B1BB6FC70,23B159C7A00)

Most of the operations rely on searching for a room by it's internet address in the $Rooms view of names.nsf, then heading to its adequate RnR database and getting all reservation documents for that specific room. Sometimes (although very rarely) I also open the users calendar and create/update reservations.

At first I thought it's caused by some memory leak or something, I went through all the code and recycled() everything I could find (and I found some places with obvious handle leaks), but it didn't help at all.

What bothers me is that the crashes happened at almost identical hours (4 days later, several minutes after 10AM).

What can be the cause of this crash? I'm not good at reading dump data, but I can see that the first call from the fatal stack calls list is an RecycleObject, followed by some calendar related things.

I have no idea where I should look in my code, why would even recycle crash the server? Does the ANCalendar suggest that I shouldn't look at code that's accessing the database directly, but rather opening users calendar?

Update Studying the crash logs, I managed to find out the place where the crash occured. It's my appointment creation code, which uses NotesCalendar.createEntry() on the users calendar. The code is like this:

        Session session = reDatabase.getParent();

        Name nnOrganizer = session.createName(session.getEffectiveUserName());

        String organizerEmail = "";
        DirectoryNavigator nav = session.getDirectory().lookupNames("$Users", nnOrganizer.getCommon(), "InternetAddress");
        if(nav.findFirstMatch() && !nav.getFirstItemValue().isEmpty()) {
            organizerEmail = (String)nav.getFirstItemValue().get(0);
        }

        Recycler.recycle(nav);

        Name nnResource = session.createName(roomName);

        DbDirectory dir = session.getDbDirectory(session.getServerName());
        Database mdb = dir.openMailDatabase();
        NotesCalendar cal = session.getCalendar(mdb);

        String dStart = DateUtil.formatICalendar(dtStart);
        String dEnd = DateUtil.formatICalendar(dtEnd);

        String iCalEntry = "BEGIN:VCALENDAR\n" +
        // Rest of iCalendar string goes here
        iCalEntry += "END:VEVENT\n" +
            "END:VCALENDAR\n";

        cal.setAutoSendNotices(true);

        String apptUNID ="";

        try {
            NotesCalendarEntry entry = cal.createEntry(iCalEntry);
            Document doc = entry.getAsDocument();
            apptUNID = doc.getItemValueString("ApptUNID");
            Recycler.recycle(doc, entry);
        } catch (NotesException ex) {
            System.out.println("Couldn't create appointment!: " + ex.toString());
            throw ex;
        } finally {
            Recycler.recycle(mdb, cal, nnOrganizer, nnResource, dir, reDatabase, session);
        }
        return apptUNID; // return the UNID of the created entry (if any)

Considering the fatal call stack starts with an RecycleObject call, is there anything wrong in my recycling here? Can I recycle the calendar entry directly after creating it? It's still kinda confusing to me, but this code works well on my test server. Is there anything wrong about it? This is the last code that's being executed when creating an appointment, a HTTP response with the apptUNID is made directly after calling the above function.

xpages
lotus-notes
lotus-domino
asked on Stack Overflow Feb 21, 2020 by hazelnutek • edited Feb 24, 2020 by hazelnutek

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0