CUDA - Malloc inside kernel ( compute_50,sm_50 )

Question

CUDA - Malloc inside kernel ( compute_50,sm_50 )

I had a problem while running a program with the CUDA Memory Checker. In other threads on stackoverflow, the main problem with using malloc inside a kernel was that the "compute_50,sm_50" was not set properly. Here the code compiles so this is not the problem.

The problem is now solved, but I don't understand why the new code solved the problem. My question is: why it is working now ?

Old code:

__device__ unsigned int draw_active_levels(curandState * localState,const int num_levels_max){
    unsigned int return_value = 0;
    float draw;
    draw = curand_uniform(localState);
    int num_active_levels = floorf(draw * (num_levels_max - 1)) + 1;


    double * arrLevelWeights = (double*) malloc((num_levels_max+1) * sizeof(double));
    arrLevelWeights[num_levels_max]=0.0; //<--------Error on this line
    double level_weights = 1.0 / num_levels_max;
    for(int i=0; i<num_levels_max; i++){
        arrLevelWeights[i] = level_weights;
    }
    //...
    //do some operations using arrLevelWeights
    //..

    free(arrLevelWeights);
    return return_value;
}

Error with old code:

Memory Checker detected 2 access violations.
error = access violation on store (global memory)
gridid = 198
blockIdx = {1,0,0}
threadIdx = {29,0,0}
address = 0x00000020
accessSize = 8

New code: I just added a few lines to check if malloc returned a null pointer.

__device__ unsigned int draw_active_levels(curandState * localState,const int num_levels_max){
    unsigned int return_value = 0;
    float draw;
    draw = curand_uniform(localState);
    int num_active_levels = floorf(draw * (num_levels_max - 1)) + 1;


    double * arrLevelWeights;
    arrLevelWeights = (double*) malloc((num_levels_max+1) * sizeof(double));
    if(arrLevelWeights == NULL){
        printf("Error while dynamically allocating memory on device.\n"); //<--- this line is never called (I put a breakpoint on it)
    }
    arrLevelWeights[num_levels_max]=0.0; //<-------Error disapeared !
    double level_weights = 1.0 / num_levels_max;
    for(int i=0; i<num_levels_max; i++){
        arrLevelWeights[i] = level_weights;
    }
    //...
    //do some operations using arrLevelWeights
    //..

    free(arrLevelWeights);
    return return_value;
}

cuda

asked on Stack Overflow Jul 19, 2014 by

RemiDav

1 Answer

If malloc returns NULL, it simply means that you've run out of device heap space which has, by default, a size of 8 MB. I'm not sure how adding a line that is never executed fixes the problem, though.

As you said in a comment, you ran out of heap space because of a missing free somewhere else in your code, which is why I suggest you use RAII (with your own smart pointer class) for memory allocations to avoid this kind of problem in the future.

answered on Stack Overflow Jul 19, 2014 by

user703016

User contributions licensed under CC BY-SA 3.0