Unhandled exception with cudaMemcpy2D


I am new to C++ (aswell as Cuda and OpenCV), so I am sorry for any mistakes on my side. I have an existing code that uses Cuda. Recently it worked with .png (that was decoded) as an input but now I use a camera to generate live images. These images are the new input for the code. Here it is:

using namespace cv;

INT height = 2160;
INT width = 3840;
Mat image(height, width, CV_8UC3);
size_t pitch;
uint8_t* image_gpu;

// capture image
VideoCapture camera(0);
camera.set(CAP_PROP_FRAME_WIDTH, width);
camera.set(CAP_PROP_FRAME_HEIGHT, height);

// here I checked if image is definitly still a CV_8UC3 Mat with the initial height and width; and it is

cudaMallocPitch(&image_gpu, &pitch, width * 4, height);

// here I use cv::Mat::data to get the pointer to the data of the image:
cudaMemcpy2D(image_gpu, pitch, image.data, width*4, width*4, height, cudaMemcpyHostToDevice);

The code compiles but I get an "Exception Thrown" at the last line (cudaMemcpy2D) with the following error code: Exception thrown at 0x00007FFE838D6660 (nvcuda.dll) in realtime.exe: 0xC0000005: Access violation reading location 0x000001113AE10000.

Google did not give me an answer and I do not know ho to proceed from here on.

Thanks for any hints!

asked on Stack Overflow Jan 20, 2020 by GreenCoder

1 Answer


A rather generic way to copy an OpenCV Mat to the device memory allocated using cudaMallocPitch is to utilize the step member of the Mat object. Also, while allocating device memory, you must have a visual intuition in mind that how the device memory will be allocated and how the Mat object will be copied to it. Here is a simple example demonstrating the procedure for a video frame captured using VideoCapture.


using std::cout;
using std::endl;

size_t getPixelBytes(int type)
        case CV_8UC1:
        case CV_8UC3:
            return sizeof(uint8_t);
        case CV_16UC1:
        case CV_16UC3:
            return sizeof(uint16_t);
        case CV_32FC1:
        case CV_32FC3:
            return sizeof(float);
        case CV_64FC1:
        case CV_64FC3:
            return sizeof(double);
            return 0;

int main()
    cv::VideoCapture cap(0);
    cv::Mat frame;

        cout<<"Cannot read video"<<endl;
        return -1;

    uint8_t* gpu_image;
    size_t gpu_pitch;

    //Get number of bytes occupied by a single pixel. Although VideoCapture mostly returns CV_8UC3 type frame thus pixelBytes is 1 , but just in case.
    size_t pixelBytes = getPixelBytes(frame.type());

    //Number of actual data bytes occupied by a row.
    size_t frameRowBytes = frame.cols * frame.channels * pixelBytes;

    //Allocate pitch linear memory on device
    cudaMallocPitch(&gpu_image, &gpu_pitch, frameRowBytes , frame.rows);

    //Copy memory from frame to device mempry
    cudaMemcpy2D(gpu_image, gpu_pitch, frame.ptr(), frame.step, frameRowBytes, frame.rows, cudaMemcpyHostToDevice);

   //Rest of the code ...
   return 0;

Disclaimer: Code is written in the browser. Not tested yet. Please add CUDA error checking as required

answered on Stack Overflow Jan 20, 2020 by sgarizvi

User contributions licensed under CC BY-SA 3.0