I had a problem while running a program with the CUDA Memory Checker. In other threads on stackoverflow, the main problem with using malloc inside a kernel was that the "compute_50,sm_50" was not set properly. Here the code compiles so this is not the problem.
The problem is now solved, but I don't understand why the new code solved the problem. My question is: why it is working now ?
Old code:
__device__ unsigned int draw_active_levels(curandState * localState,const int num_levels_max){
unsigned int return_value = 0;
float draw;
draw = curand_uniform(localState);
int num_active_levels = floorf(draw * (num_levels_max - 1)) + 1;
double * arrLevelWeights = (double*) malloc((num_levels_max+1) * sizeof(double));
arrLevelWeights[num_levels_max]=0.0; //<--------Error on this line
double level_weights = 1.0 / num_levels_max;
for(int i=0; i<num_levels_max; i++){
arrLevelWeights[i] = level_weights;
}
//...
//do some operations using arrLevelWeights
//..
free(arrLevelWeights);
return return_value;
}
Error with old code:
Memory Checker detected 2 access violations.
error = access violation on store (global memory)
gridid = 198
blockIdx = {1,0,0}
threadIdx = {29,0,0}
address = 0x00000020
accessSize = 8
New code: I just added a few lines to check if malloc returned a null pointer.
__device__ unsigned int draw_active_levels(curandState * localState,const int num_levels_max){
unsigned int return_value = 0;
float draw;
draw = curand_uniform(localState);
int num_active_levels = floorf(draw * (num_levels_max - 1)) + 1;
double * arrLevelWeights;
arrLevelWeights = (double*) malloc((num_levels_max+1) * sizeof(double));
if(arrLevelWeights == NULL){
printf("Error while dynamically allocating memory on device.\n"); //<--- this line is never called (I put a breakpoint on it)
}
arrLevelWeights[num_levels_max]=0.0; //<-------Error disapeared !
double level_weights = 1.0 / num_levels_max;
for(int i=0; i<num_levels_max; i++){
arrLevelWeights[i] = level_weights;
}
//...
//do some operations using arrLevelWeights
//..
free(arrLevelWeights);
return return_value;
}
If malloc
returns NULL
, it simply means that you've run out of device heap space which has, by default, a size of 8 MB. I'm not sure how adding a line that is never executed fixes the problem, though.
As you said in a comment, you ran out of heap space because of a missing free
somewhere else in your code, which is why I suggest you use RAII (with your own smart pointer class) for memory allocations to avoid this kind of problem in the future.
User contributions licensed under CC BY-SA 3.0