EDIT I had a typo in my command to launch lldb (see comment below) and I'm updating the post to get to a different larger issue
I'm trying to debug my MPI application in lldb and upon an error (e.g., segv or abort). Here's how I'm invoking my mpi run:
/usr/local/bin/mpiexec -np 3 -disable-auto-cleanup xterm -e "lldb -s lldb.commands -- app_binary <args> ; sleep 100
Immediately when I start running, I get this error trace. I think the most relevant line is
PMI_Get_appnum returned -1
[cli_0]: write_line error; fd=8 buf=:cmd=init pmi_version=1 pmi_subversion=1 : system msg for write_line failure : Bad file descriptor [cli_0]: Unable to write to PMI_fd [cli_0]: write_line error; fd=8 buf=:cmd=get_appnum : system msg for write_line failure : Bad file descriptor Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(565): MPID_Init(175).......: channel initialization failed MPID_Init(463).......: PMI_Get_appnum returned -1 [cli_0]: write_line error; fd=8 buf=:cmd=abort exitcode=1094415 : system msg for write_line failure : Bad file descriptor Process 19063 exited with status = 15 (0x0000000f)
Unfortunately, some mailing lists show that this is a general bug with MPICH on OSX (see https://github.com/pmodels/mpich/issues/2063 -- currently still unresolved). Does anyone have a workaround?
Since you're using lldb and you're probably also using
clang, you could use something called the address sanitizer to compile your code with runtime checks for memory errors.
Just add the following to your compile command:
-g -fsanitize=address -fno-omit-frame-pointer -fsanitize-recover=address. It would look like
mpicc object.o -o exec -g -fsanitize=address -fno-omit-frame-pointer -fsanitize-recover=address
When using the address sanitizer your code will print a small stack trace to when you made a move to index out of bounds or address memory you don't own.
If you combine the address sanitizer with lldb then it should stop the execution at the line where a memory problem occurred. Although, I haven't had much success with running lldb and MPI at the same time. Either way the address sanitizer should help you.
User contributions licensed under CC BY-SA 3.0