Does __device__ variable have a size limit

0

I want to use global variable for several kernel method, but when I use the flowing code to init __device__ variable, I got a [access violation on store (global memory)] error when I init the second var.

__device__ short* blockTmp[4];
//init blockTmp
template<int BS>
__global__ void InitQuarterBuf_kernel(
    )
{

    int iBufSize = 2000000;
    for (int i = 0; i < 4; i++){
        blockTmp[[i] = new short[iBufSize];
        blockTmp[[i][iBufSize-1]=1;
        printf("blockTmp[[%d][%d] is %d.\n",i,iBufSize-1,blockTmp[[i][iBufSize-1]);     
    }
}

The error message:

Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 94
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x003d08fe
accessSize = 2

CUDA grid launch failed: CUcontext: 1014297073536 CUmodule: 1013915337344 Function: _Z21InitBuf_kernelILi8EEvii
CUDA context created : 67e557f3e0
CUDA module loaded:   67cdc7ed80 

CUDA module loaded:   67cdc7e180 
================================================================================
CUDA Memory Checker detected 1 threads caused an access violation:
Launch Parameters
    CUcontext    = 67e557f3e0
    CUstream     = 67cdc7f580
    CUmodule     = 67cdc7e180
    CUfunction   = 67eb64b2f0
    FunctionName = _Z21InitBuf_kernelILi8EEvii
    GridId       = 94
    gridDim      = {1,1,1}
    blockDim     = {1,1,1}
    sharedSize   = 256
    Parameters (raw):
         0x00000780 0x00000440
GPU State:
   Address  Size      Type  Mem       Block  Thread         blockIdx  threadIdx                                         PC  Source
----------------------------------------------------------------------------------------------------------------------------------
  003d08fe     2    adr st    g           0       0          {0,0,0}    {0,0,0}  _Z21InitBuf_kernelILi8EEvii+0004c8  


Summary of access violations:
xxxx_launcher.cu(481): error MemoryChecker: #misaligned=0  #invalidAddress=1
================================================================================

Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 94
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x003d08fe
accessSize = 2

CUDA grid launch failed: CUcontext: 446229378016 CUmodule: 445834060160 Function: _Z21InitBuf_kernelILi8EEvii

Is there some limit for __device__ variable? How can I init the __device__ variable?

And if I change the buffer size to 1000, it is OK.

cuda
global
device
asked on Stack Overflow Sep 10, 2015 by shinpa • edited Sep 11, 2015 by shinpa

1 Answer

1

Your posted kernel doesn't really make sense, as your __device__ variable is named blockTmp but you are initializing m_filteredBlockTmp variables in your kernel, which don't appear to be defined anywhere.

Anyway, supposing these are intended to be the same, the issue is probably not related to your usage of __device__ variables (pointers, in this case) but your use of in-kernel new which definitely has allocation limits.

These limits and behavior are the same as what is described in the programming guide for in-kernel malloc. In particular, the default limit is 8MB and if you need more (in the "device heap") you must explicitly raise the limit with a CUDA runtime API call.

A useful error check in these situations is to check whether the pointer returned by new or malloc is NULL, which would indicate an allocation failure. If you fail to do that check, but then attempt to use the pointer anyway, you are going to run into trouble as described in your post.

answered on Stack Overflow Sep 10, 2015 by Robert Crovella

User contributions licensed under CC BY-SA 3.0