Pointer is being being masked when calling a C function from Fortran

1

TL;DR

When I pass an array from Fortran to C, the array's address is incorrect in C. I've checked this by printing the address of the array in Fortran before the CALL, then stepping into the C function and printing the address of the argument.

  • The Fortran pointer: 0x9acd44c0
  • The C pointer: 0xffffffff9acd44c0

The upper dword of the C pointer has been set to 0xffffffff. I'm trying to understand why this is happening, and only happening on the HPC cluster and not on a development machine.

Context

I'm using a rather large scientific program written in Fortran/C++/CUDA. On some particular machine, I get a segfault when calling a C function from Fortran. I've found that a pointer is being passed to the C function with some bytes set incorrectly.

Code Snippets

Every Fortran file in the program includes a common header file which sets up some options and declares the common blocks.

IMPLICIT REAL*8  (A-H,O-Z)
COMMON/NBODY/  X(3,NMAX), BODY(NMAX)
COMMON/GPU/    GPUPHI(NMAX)

The Fortran call site looks like this:

CALL GPUPOT(NN,BODY(IFIRST),X(1,IFIRST),GPUPHI)

And the C function, which is compiled by nvcc, is declared like so:

extern "C" void gpupot_(int *n,
                       double m[],
                       double x[][3],
                       double pot[]);

GDB Output

I found from debugging that the value of the pointer to pot is incorrect; so any attempt to access that array will segfault.

When I ran the program with gdb, I put a break point just before the call to gpupot and printed the value of the GPUPHI variable:

(gdb) p &GPUPHI   
$1 = (PTR TO -> ( real(kind=8) (1050000))) 0x9acd44c0 <gpu_>

I then let the debugger step into the gpupot_ C function, and inspected the value of the pot argument:

(gdb) p pot
$2 = (double *) 0xffffffff9acd44c0

All of the other arguments have the correct pointer values.

Compiler options

The compiler options that are set for gfortran are:

 -fPIC -O3 -ffast-math -Wall -fopenmp -mcmodel=medium -march=native -mavx -m64  

And nvcc is using the following:

-ccbin=g++ -Xptxas -v -ftz=true -lineinfo -D_FORCE_INLINES \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_35,code=compute_35 -Xcompiler \
"-O3 -fPIC -Wall -fopenmp -std=c++11 -fPIE -m64 -mavx \
-march=native" -std=c++14 -lineinfo 

For debugging, the -O3 is replaced with -g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer, but the behaviour (crash) remains the same.

c
cuda
fortran
asked on Stack Overflow Jan 26, 2019 by Anthony • edited Jan 26, 2019 by Anthony

1 Answer

1

This is prefaced by my top comments [and yours].

It looks like you're getting an [unwanted] sign extension of the address.

gfortran is being built with -mcmodel=medium but C does not.

With that option, larger symbols/arrays will be linked above 2GB [which has the sign bit set]

So, add the option to both or leave it off both to fix the problem.

answered on Stack Overflow Jan 26, 2019 by Craig Estey

User contributions licensed under CC BY-SA 3.0