I'm quite new to CUDA/C++ programming and I'm stuck at passing the input parameters to the CUDA Kernel from the Tensorflow C++ API.
First off I register the following Op:
REGISTER_OP("Op")
.Attr("T: {float, int64}")
.Input("in: T")
.Input("angles: T")
.Output("out: T");
Afterwards I want to pass the second Input (angles) through to the CPU/GPU Kernel. Somehow the following implementation works fine for the CPU implementation but throws an error in Python when I run it on my GPU... Python Error message:
Process finished with exit code -1073741819 (0xC0000005)
This is how I'm trying to access the value of the Input. Note that the input for "angles" is allways a single value (float or int):
void Compute(OpKernelContext* context) override {
...
const Tensor &input_angles = context->input(1);
auto angles_flat = input_angles.flat<float>();
const float N = angles_flat(0);
...
}
Calling the CPU/GPU Kernels as follows:
...
Functor<Device, T>()(
context->eigen_device<Device>(),
static_cast<int>(input_tensor.NumElements()),
input_tensor.flat<T>().data(),
output_tensor->flat<T>().data(),
N);
...
As I said before, running this Op on the CPU works just how I it want to, but when I run it on the GPU I always get the abovementioned Python Error... Does someone know how to fix this? I can only guess that I'm trying to access a wrong address on the GPU with angles_flat(0)
... So if anybody can help me out here it would be highly appreciated!!
User contributions licensed under CC BY-SA 3.0