Analyse truncated linux core-dump on windows

0

I have written a chess-engine with a friend which plays at the Top Chess Engine Championship (TCEC). We just placed first in the Qualification league even tho our engine has crashed in one game which was acoounted as a loss. I do know the basics about programming in C++ but I am stuck at analysing the resulting core dump.

The reason I cannot debug on Linux is because I do not have a Linux machine. the engine was running on a 176-core linux machine hosted by TCEC.

I would like to get the memory representation of the Board* board object which has been passed to the getWDL(Board* board) function

We have received the following information by the admins of TCEC.

Core was generated by `./Koivisto_4.44-x64-linux-native'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040c039 in probe_table((anonymous namespace)::Pos const*, int, int*, int) ()
[Current thread is 1 (LWP 3364456)]
(gdb) bt
#0  0x000000000040c039 in probe_table((anonymous namespace)::Pos const*, int, int*, int) ()
#1  0x000000000040cb4f in probe_ab((anonymous namespace)::Pos const*, int, int, int*) ()
#2  0x000000000040cc56 in probe_wdl((anonymous namespace)::Pos*, int*) [clone .lto_priv.167] ()
#3  0x0000000000411029 in getWDL(Board*) [clone .part.11] ()
#4  0x0000000000419a71 in pvSearch(Board*, short, short, unsigned char, unsigned char, ThreadData*, unsigned int, unsigned char*) ()
#5  0x000000000041a0f2 in pvSearch(Board*, short, short, unsigned char, unsigned char, ThreadData*, unsigned int, unsigned char*) ()
#6  0x000000000041a52d in pvSearch(Board*, short, short, unsigned char, unsigned char, ThreadData*, unsigned int, unsigned char*) ()
#7  0x000000000041a52d in pvSearch(Board*, short, short, unsigned char, unsigned char, ThreadData*, unsigned int, unsigned char*) ()
#8  0x000000000041a0f2 in pvSearch(Board*, short, short, unsigned char, unsigned char, ThreadData*, unsigned int, unsigned char*) ()


(gdb) info registers
rax            0xfffffffffffffffa  -6
rbx            0x7f66a8ff4a10      140078898694672
rcx            0xb87               2951
rdx            0xe3b               3643
rsi            0xffffffff          4294967295
rdi            0x6                 6
rbp            0x7f6730006480      0x7f6730006480
rsp            0x7f66a8ff4790      0x7f66a8ff4790
r8             0x0                 0
r9             0x0                 0
r10            0x7fa7a81ee580      140358056863104
r11            0x7f6730006530      140081163691312
r12            0x7                 7
r13            0x17428d50          390237520
r14            0xfc00000000000000  -288230376151711744
r15            0x7f67a80d504a      140083177803850
rip            0x40c039            0x40c039 <probe_table((anonymous namespace)::Pos const*, int, int*, int)+921>
eflags         0x10297             [ CF PF AF SF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

 │0x40c022 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+898>        lea    0x1(%rdi),%r12d   
 │0x40c026 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+902>        mov    %rdi,0x28(%rsp)  
 │0x40c02b <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+907>        add    0x10(%rbp),%r10   
 │0x40c02f <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+911>        neg    %rax   
 │0x40c032 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+914>        movslq %r12d,%r12   
 │0x40c035 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+917>        mov    0x38(%rbp),%r14   
>│0x40c039 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+921>        movbe  (%r10),%rsi   
 │0x40c03e <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+926>        lea    0x0(%rbp,%rax,8),%rbx   
 │0x40c043 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+931>        add    $0x8,%r10   
 │0x40c047 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+935>        nopw   0x0(%rax,%rax,1)  
 │0x40c050 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+944>        cmp    %rsi,%r14   
 │0x40c053 <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+947>        jbe    0x40c0ce <_Z11probe_tablePKN12_GLOBAL__N_13PosEiPii+1070>

It is important to note that the probe_table, probe_ab, probe_wdl have not been implemented by us but is a Library used by basically any chess programm to read table-base files up to 7 pieces on the board.

A crash has never been observed by other programs regarding this library. That is why I conclude the input I have given to the probe_wdl were wrong.

The getWDL(Board* board) function looks like this:

Score getWDL(Board* board) {
    UCI_ASSERT(board);
    
    // we cannot prove the tables if there are too many pieces on the board
    if (bitCount(*board->getOccupiedBB()) > (signed) TB_LARGEST)
        return MAX_MATE_SCORE;
    
    // use the given files to prove the tables using the information from the board.
    unsigned res = tb_probe_wdl(
        board->getTeamOccupiedBB()[WHITE], 
        board->getTeamOccupiedBB()[BLACK],
        board->getPieceBB()[WHITE_KING]     | board->getPieceBB()[BLACK_KING],
        board->getPieceBB()[WHITE_QUEEN]    | board->getPieceBB()[BLACK_QUEEN],
        board->getPieceBB()[WHITE_ROOK]     | board->getPieceBB()[BLACK_ROOK],
        board->getPieceBB()[WHITE_BISHOP]   | board->getPieceBB()[BLACK_BISHOP],
        board->getPieceBB()[WHITE_KNIGHT]   | board->getPieceBB()[BLACK_KNIGHT],
        board->getPieceBB()[WHITE_PAWN]     | board->getPieceBB()[BLACK_PAWN], 
        board->getCurrent50MoveRuleCount(),
        board->getCastlingRights(0) | 
            board->getCastlingRights(1) | 
            board->getCastlingRights(2) | 
            board->getCastlingRights(3),
        board->getEnPassantSquare() != 64 ? board->getEnPassantSquare() : 0, 
        board->getActivePlayer() == WHITE);

Beside the information above, we have gotten an incomplete coredump with a size of around 2 Gb. Note that the entire memory usage of the program was roughly 100Gb where most of the memory was used for indexing some hash table inside the search tree and is not relevant for debugging.

Since I have never worked with anything like this, I would be very happy if someone could help and explain me on how I could read and parse the core-dump to extract the information stored inside Board* board to check if and how the Board-object has been altered.

Greetings Finn

Edit 1

I do have gdb on my machine together with the incomplete core-dump and the original linux-executable which has been compiled similar to this:

g++ -O3 -std=c++17 -Wall -Wextra -Wshadow -DNDEBUG -flto -march=native *.cpp syzygy/tbprobe.c -DMINOR_VERSION=50 -DMAJOR_VERSION=4 -pthread -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -DUSE_POPCNT -msse3 -mpopcnt -o ../bin/Koivisto_4.44-x64-linux-native.exe
c++
debugging
gdb
coredump
asked on Stack Overflow May 19, 2021 by Finn Eggers • edited May 19, 2021 by Finn Eggers

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0