I am experiencing some odd behavior with some CUDA code that runs a critical section, and the compiling flags passed to nvcc.
Specifically the problem is that if the code is compiled with the -G (--device-debug) flag, the critical section doesn't executes for all the threads, in fact I believe there would be a deadlock if it weren't for the counter in the while.
The strange thing is that without the -G flag, the critical section does executes for all threads without deadlock or any other issue.
The code is:
//can_write is in device memory (device RAM)
__device__ void critical_section(int *can_write)
{
    int t=0,v=0;
    while(t<4096)
    {
         v=atomicCAS(&(can_write),1,0);
         if(v!=0)
         {
             //critical section, without synchronization points
             atomicExch(&(can_write),1);
             return;
          }
          ++t;
    }
}
My platform is:
Edit
The code with the counter, is just to avoid the program with the -G flag, entering the alleged deadlock. Without the counter I needed to terminate the program, because it went for 20 s without terminating. The kernel calling the function is:
__global__ void kernel(int *cwrite)
{
        critical_section(cwrite);
 }
I believe there is a deadlock, because even when the kernel is launched with <<<1,256>>>, it gets stuck. And without the -G flag, the program lasts just a few milli seconds.
Solution to avoid the odd behavior:
__device__ void critical_section(int *can_write)
{
    int v=0,executed=0,ready=0;
    while(ready==0)
    {
         ready=__all_sync(0xFFFFFFFF,executed);
         if(executed==0)
         {
              v=atomicCAS(&(can_write),1,0);
              if(v!=0)
              {
                   //critical section, without synchronization points
                   atomicExch(&(can_write),1);
                   executed=1;
              }
              ++t;
         }
    }
}
User contributions licensed under CC BY-SA 3.0