When I pass an array from Fortran to C, the array's address is incorrect in C. I've checked this by printing the address of the array in Fortran before the CALL, then stepping into the C function and printing the address of the argument.
0x9acd44c00xffffffff9acd44c0The upper dword of the C pointer has been set to 0xffffffff. I'm trying to understand why this is happening, and only happening on the HPC cluster and not on a development machine.
I'm using a rather large scientific program written in Fortran/C++/CUDA. On some particular machine, I get a segfault when calling a C function from Fortran. I've found that a pointer is being passed to the C function with some bytes set incorrectly.
Every Fortran file in the program includes a common header file which sets up some options and declares the common blocks.
IMPLICIT REAL*8 (A-H,O-Z)
COMMON/NBODY/ X(3,NMAX), BODY(NMAX)
COMMON/GPU/ GPUPHI(NMAX)
The Fortran call site looks like this:
CALL GPUPOT(NN,BODY(IFIRST),X(1,IFIRST),GPUPHI)
And the C function, which is compiled by nvcc, is declared like so:
extern "C" void gpupot_(int *n,
double m[],
double x[][3],
double pot[]);
I found from debugging that the value of the pointer to pot is incorrect; so any attempt to access that array will segfault.
When I ran the program with gdb, I put a break point just before the call to gpupot and printed the value of the GPUPHI variable:
(gdb) p &GPUPHI
$1 = (PTR TO -> ( real(kind=8) (1050000))) 0x9acd44c0 <gpu_>
I then let the debugger step into the gpupot_ C function, and inspected the value of the pot argument:
(gdb) p pot
$2 = (double *) 0xffffffff9acd44c0
All of the other arguments have the correct pointer values.
The compiler options that are set for gfortran are:
-fPIC -O3 -ffast-math -Wall -fopenmp -mcmodel=medium -march=native -mavx -m64
And nvcc is using the following:
-ccbin=g++ -Xptxas -v -ftz=true -lineinfo -D_FORCE_INLINES \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_35,code=compute_35 -Xcompiler \
"-O3 -fPIC -Wall -fopenmp -std=c++11 -fPIE -m64 -mavx \
-march=native" -std=c++14 -lineinfo
For debugging, the -O3 is replaced with -g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer, but the behaviour (crash) remains the same.
This is prefaced by my top comments [and yours].
It looks like you're getting an [unwanted] sign extension of the address.
gfortran is being built with -mcmodel=medium but C does not.
With that option, larger symbols/arrays will be linked above 2GB [which has the sign bit set]
So, add the option to both or leave it off both to fix the problem.
User contributions licensed under CC BY-SA 3.0