I want to use global variable for several kernel method, but when I use the flowing code to init __device__
variable, I got a [access violation on store (global memory)]
error when I init the second var.
__device__ short* blockTmp[4];
//init blockTmp
template<int BS>
__global__ void InitQuarterBuf_kernel(
)
{
int iBufSize = 2000000;
for (int i = 0; i < 4; i++){
blockTmp[[i] = new short[iBufSize];
blockTmp[[i][iBufSize-1]=1;
printf("blockTmp[[%d][%d] is %d.\n",i,iBufSize-1,blockTmp[[i][iBufSize-1]);
}
}
The error message:
Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 94
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x003d08fe
accessSize = 2
CUDA grid launch failed: CUcontext: 1014297073536 CUmodule: 1013915337344 Function: _Z21InitBuf_kernelILi8EEvii
CUDA context created : 67e557f3e0
CUDA module loaded: 67cdc7ed80
CUDA module loaded: 67cdc7e180
================================================================================
CUDA Memory Checker detected 1 threads caused an access violation:
Launch Parameters
CUcontext = 67e557f3e0
CUstream = 67cdc7f580
CUmodule = 67cdc7e180
CUfunction = 67eb64b2f0
FunctionName = _Z21InitBuf_kernelILi8EEvii
GridId = 94
gridDim = {1,1,1}
blockDim = {1,1,1}
sharedSize = 256
Parameters (raw):
0x00000780 0x00000440
GPU State:
Address Size Type Mem Block Thread blockIdx threadIdx PC Source
----------------------------------------------------------------------------------------------------------------------------------
003d08fe 2 adr st g 0 0 {0,0,0} {0,0,0} _Z21InitBuf_kernelILi8EEvii+0004c8
Summary of access violations:
xxxx_launcher.cu(481): error MemoryChecker: #misaligned=0 #invalidAddress=1
================================================================================
Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 94
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x003d08fe
accessSize = 2
CUDA grid launch failed: CUcontext: 446229378016 CUmodule: 445834060160 Function: _Z21InitBuf_kernelILi8EEvii
Is there some limit for __device__ variable
? How can I init the __device__ variable
?
And if I change the buffer size to 1000, it is OK.
Your posted kernel doesn't really make sense, as your __device__
variable is named blockTmp
but you are initializing m_filteredBlockTmp
variables in your kernel, which don't appear to be defined anywhere.
Anyway, supposing these are intended to be the same, the issue is probably not related to your usage of __device__
variables (pointers, in this case) but your use of in-kernel new
which definitely has allocation limits.
These limits and behavior are the same as what is described in the programming guide for in-kernel malloc
. In particular, the default limit is 8MB and if you need more (in the "device heap") you must explicitly raise the limit with a CUDA runtime API call.
A useful error check in these situations is to check whether the pointer returned by new
or malloc
is NULL, which would indicate an allocation failure. If you fail to do that check, but then attempt to use the pointer anyway, you are going to run into trouble as described in your post.
User contributions licensed under CC BY-SA 3.0