EDIT I had a typo in my command to launch lldb (see comment below) and I'm updating the post to get to a different larger issue
I'm trying to debug my MPI application in lldb and upon an error (e.g., segv or abort). Here's how I'm invoking my mpi run:
/usr/local/bin/mpiexec -np 3 -disable-auto-cleanup xterm -e "lldb -s lldb.commands -- app_binary <args> ; sleep 100
Immediately when I start running, I get this error trace. I think the most relevant line is PMI_Get_appnum returned -1
[cli_0]: write_line error; fd=8 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=8 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(565):
MPID_Init(175).......: channel initialization failed
MPID_Init(463).......: PMI_Get_appnum returned -1
[cli_0]: write_line error; fd=8 buf=:cmd=abort exitcode=1094415
:
system msg for write_line failure : Bad file descriptor
Process 19063 exited with status = 15 (0x0000000f)
Unfortunately, some mailing lists show that this is a general bug with MPICH on OSX (see https://github.com/pmodels/mpich/issues/2063 -- currently still unresolved). Does anyone have a workaround?
Since you're using lldb and you're probably also using clang
, you could use something called the address sanitizer to compile your code with runtime checks for memory errors.
Just add the following to your compile command: -g -fsanitize=address -fno-omit-frame-pointer -fsanitize-recover=address
. It would look like
mpicc object.o -o exec -g -fsanitize=address -fno-omit-frame-pointer -fsanitize-recover=address
When using the address sanitizer your code will print a small stack trace to when you made a move to index out of bounds or address memory you don't own.
If you combine the address sanitizer with lldb then it should stop the execution at the line where a memory problem occurred. Although, I haven't had much success with running lldb and MPI at the same time. Either way the address sanitizer should help you.
User contributions licensed under CC BY-SA 3.0