Tensorflow C++ API: How to pass input parameters to CUDA kernel


I'm quite new to CUDA/C++ programming and I'm stuck at passing the input parameters to the CUDA Kernel from the Tensorflow C++ API.

First off I register the following Op:

.Attr("T: {float, int64}")
.Input("in: T")
.Input("angles: T")
.Output("out: T");

Afterwards I want to pass the second Input (angles) through to the CPU/GPU Kernel. Somehow the following implementation works fine for the CPU implementation but throws an error in Python when I run it on my GPU... Python Error message:

Process finished with exit code -1073741819 (0xC0000005)

This is how I'm trying to access the value of the Input. Note that the input for "angles" is allways a single value (float or int):

void Compute(OpKernelContext* context) override {
const Tensor &input_angles = context->input(1);
auto angles_flat = input_angles.flat<float>();
const float N = angles_flat(0);

Calling the CPU/GPU Kernels as follows:

Functor<Device, T>()(

As I said before, running this Op on the CPU works just how I it want to, but when I run it on the GPU I always get the abovementioned Python Error... Does someone know how to fix this? I can only guess that I'm trying to access a wrong address on the GPU with angles_flat(0)... So if anybody can help me out here it would be highly appreciated!!

asked on Stack Overflow Nov 11, 2020 by L. Brasi • edited Nov 12, 2020 by talonmies

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0