So I have the following kernel (there's more to it but the bug only pertains to this part as everything else is commented out):
const int CHNL_SIZE = 1024;
const int NUM_CHNL = 3;
const int NUM_IMG = 60000;
__global__ void cuda_prepData(GPUImg_T *imgs) {
for (int i = 0; i < NUM_IMG; i++) {
cudaMalloc((void **)&imgs[i].pxls, CHNL_SIZE * NUM_CHNL * sizeof(double));
assert(imgs[i].pxls != NULL);
imgs[i].pxls[0] = 0.5f;
}
}
It's called from the host like this:
cudaMalloc((void **)&data->imgs, NUM_IMG * sizeof(GPUImg_T));
cuda_prepData<<<1, 1>>>(data->imgs);
cudaDeviceSynchronize();
Naturally, data
contains a 1d array of GPUImg_T's, each of which contain a 1d array of pxls. I am trying to cudaMalloc and assign each pxls array in my kernel, which is just using one thread right now for debugging purposes. For some reason, the program crashes exactly when I try to de-reference imgs[314].pxls
(I've tried multiple times, it's always 314). I put the assert statement in to test things, and it turns out that cudaMalloc is assigning imgs[314].pxls
to NULL. My first assumption was that I was running out of video memory, so I used nvidia-smi
(I'm on linux) to check my GPU usage and it showed that the program was only using ~135/8111MiB. When I remove the assert, cuda-memcheck
reports the following:
========= Invalid __global__ write of size 8
========= at 0x000002f0 in cuda_prepData(unsigned char*, unsigned char*, unsigned char*, GPUImg*)
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x00000000 is out of bounds
Any idea what's happening/how I can fix it? Thank you!
User contributions licensed under CC BY-SA 3.0