I am experiencing some odd behavior with some CUDA code that runs a critical section, and the compiling flags passed to nvcc.
Specifically the problem is that if the code is compiled with the -G (--device-debug) flag, the critical section doesn't executes for all the threads, in fact I believe there would be a deadlock if it weren't for the counter in the while.
The strange thing is that without the -G flag, the critical section does executes for all threads without deadlock or any other issue.
The code is:
//can_write is in device memory (device RAM)
__device__ void critical_section(int *can_write)
{
int t=0,v=0;
while(t<4096)
{
v=atomicCAS(&(can_write),1,0);
if(v!=0)
{
//critical section, without synchronization points
atomicExch(&(can_write),1);
return;
}
++t;
}
}
My platform is:
Edit
The code with the counter, is just to avoid the program with the -G flag, entering the alleged deadlock. Without the counter I needed to terminate the program, because it went for 20 s without terminating. The kernel calling the function is:
__global__ void kernel(int *cwrite)
{
critical_section(cwrite);
}
I believe there is a deadlock, because even when the kernel is launched with <<<1,256>>>, it gets stuck. And without the -G flag, the program lasts just a few milli seconds.
Solution to avoid the odd behavior:
__device__ void critical_section(int *can_write)
{
int v=0,executed=0,ready=0;
while(ready==0)
{
ready=__all_sync(0xFFFFFFFF,executed);
if(executed==0)
{
v=atomicCAS(&(can_write),1,0);
if(v!=0)
{
//critical section, without synchronization points
atomicExch(&(can_write),1);
executed=1;
}
++t;
}
}
}
User contributions licensed under CC BY-SA 3.0