Warp shuffle - conflicting documentation description

0

How does __shfl_sync() (and similar) functions behave when exceeding maximum threadIdx? Or attempting to read threads that have exited? In the documentation I can read two conflicting sentences that leave me puzzled. The following statements are taken from the CUDA C Programming Guide

Statement 1:

If the source lane ID is out of range or the source thread has exited, the calling thread's own var is returned.

Statement 2:

Threads may only read data from another thread which is actively participating in the __shfl_sync() command. If the target thread is inactive, the retrieved value is undefined.

But the thread that has existed, must have diverged and is not actively participating in the __shfl_sync().

For example, consider:

__global__ void kernel() {
    int val = threadIdx.x;
    if (threadIdx.x>5)
        return;
    __syncthreads();
    val = __shfl_down_sync(0x7fffffff, val, +5);
    printf("thread[%d].val = %d\n", threadIdx.x, val);
}
.....
kernel<<<1,32>>>()

Is the resulting val unchanged or undefined?

When I actually tested, I got the output:

thread[0].val = 5
thread[1].val = 0
thread[2].val = 0
thread[3].val = 0
thread[4].val = 0
thread[5].val = 0

which would suggest that the value is in fact undefined. But then what it means that a thread has exited?

I get the same result when I remove the conditional return, but invoke the kernel with just 6 threads. So ID that are too high still give the undefined value?

So is the Statement 1 completely wrong? Or I misunderstand it somehow?

cuda
asked on Stack Overflow Dec 18, 2018 by CygnusX1

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0