How to write rotate code in C to compile into the `ror` x86 instruction?

1

I have some code that rotates my data. I know GAS syntax has a single assembly instruction that can rotate an entire byte. However, when I try to follow any of the advice on Best practices for circular shift (rotate) operations in C++, my C code compiles into at least 5 instructions, which use up three registers-- even when compiling with -O3. Maybe those are best practices in C++, and not in C?

In either case, how can I force C to use the ROR x86 instruction to rotate my data?

The precise line of code which is not getting compiled to the rotate instruction is:

value = (((y & mask) << 1 ) | (y >> (size-1))) //rotate y right 1
       ^ (((z & mask) << n ) | (z >> (size-n))) // rotate z left by n
// size can be 64 or 32, depending on whether we are rotating a long or an int, and 
// mask would be 0xff or 0xffffffff, accordingly

I do not mind using __asm__ __volatile__ to do this rotate, if that's what I must do. But I don't know how to do so correctly.

c
assembly
x86
gas
asked on Stack Overflow Jul 14, 2017 by Alex • edited Jul 14, 2017 by dbush

4 Answers

7

Your macro compiles to a single ror instruction for me... specifically, I compiled this test file:

#define ROR(x,y) ((unsigned)(x) >> (y) | (unsigned)(x) << 32 - (y))
unsigned ror(unsigned x, unsigned y)
{
    return ROR(x, y);
}

as C, using gcc 6, with -O2 -S, and this is the assembly I got:

    .file   "test.c"
    .text
    .p2align 4,,15
    .globl  ror
    .type   ror, @function
ror:
.LFB0:
    .cfi_startproc
    movl    %edi, %eax
    movl    %esi, %ecx
    rorl    %cl, %eax
    ret
    .cfi_endproc
.LFE0:
    .size   ror, .-ror
    .ident  "GCC: (Debian 6.4.0-1) 6.4.0 20170704"
    .section    .note.GNU-stack,"",@progbits

Please try to do the same, and report the assembly you get. If your test program is substantially different from mine, please tell us how it differs. If you are using a different compiler or a different version of GCC please tell us exactly which one.

Incidentally, I get the same assembly output when I compile the code in the accepted answer for "Best practices for circular shift (rotate) operations in C++", as C.

answered on Stack Overflow Jul 14, 2017 by zwol
3

How old is your compiler? As I noted in the linked question, the UB-safe variable-count rotate idiom (with extra & masking of the count) confuses old compilers, like gcc before 4.9. Since you're not masking the shift count, it should be recognized with even older gcc.

Your big expression is maybe confusing the compiler. Write an inline function for rotate, and call it, like

value = rotr32(y & mask, 1) ^ rotr32(z & mask, n);

Much more readable, and may help stop the compiler from trying to do things in the wrong order and breaking the idiom before recognizing it as a rotate.


Maybe those are best practices in C++, and not in C?

My answer on the linked question clearly says that it's the best practice for C as well as C++. They are different languages, but they overlap completely for this, according to my testing.

Here's a version of the Godbolt link using -xc to compile as C, not C++. I had a couple C++isms in the link in the original question, for experimenting with integer types for the rotate count.

Like the original linked from the best-practices answer, it has a version that uses x86 intrinsics if available. clang doesn't seem to provide any in x86intrin.h, but other compilers have _rotl / _rotr for 32-bit rotates, with other sizes available.

Actually, I talked about rotate intrinsics at length in the answer on the best-practices question, not just in the godbolt link. Did you even read the answer there, apart from the code block? (If you did, your question doesn't reflect it.)


Using intrinsics, or the idiom in your own inline function, is much better than using inline asm. Asm defeats constant-propagation, among other things. Also, compilers can use BMI2 rorx dst, src, imm8 to copy-and-rotate with one instruction, if you compile with -march=haswell or -mbmi2. It's a lot harder to write an inline-asm rotate that can use rorx for immediate-count rotates but ror r32, cl for variable-count rotates. You could try with _builtin_constant_p(), but clang evaluates that before inlining, so it's basically useless for meta-programming style choice of which code to use. It works with gcc though. But it's still much better not to use inline asm unless you've exhausted all other avenues (like asking on SO) to avoid it. https://gcc.gnu.org/wiki/DontUseInlineAsm

Fun fact: the rotate functions in gcc's x86intrin.h are just pure C using the rotate idiom that gcc recognizes. Except for 16-bit rotates, where they use __builtin_ia32_rolhi.

answered on Stack Overflow Jul 15, 2017 by Peter Cordes
0

You might need to be a bit more specific with what integral type / width you're rotating, and whether you have a fixed or variable rotation. ror{b,w,l,q} (8, 16, 32, 64-bit) has forms for (1), imm8, or the %cl register. As an example:

static inline uint32_t rotate_right (uint32_t u, size_t r)
{
    __asm__ ("rorl %%cl, %0" : "+r" (u) : "c" (r));
    return u;
}

I haven't tested this, it's just off the top of my head. And I'm sure multiple constraint syntax could be used to optimize cases where a constant (r) value is used, so %e/rcx is left alone.


If you're using a recent version of gcc or clang (or even icc). The intrinsics header <x86intrin.h>, may provide __ror{b|w|d|q} intrinsics. I haven't tried them.

answered on Stack Overflow Jul 14, 2017 by Brett Hale • edited Jul 14, 2017 by Brett Hale
-2

Best Way:

#define rotr32(x, n) (( x>>n  ) | (x<<(64-n)))
#define rotr64(x, n) (( x>>n  ) | (x<<(32-n)))

More generic:

#define rotr(x, n) (( x>>n  ) | (x<<((sizeof(x)<<3)-n)))

And it compiles (in GCC) with exactly the same code as the asm versions below.

For 64 bit:

__asm__ __volatile__("rorq %b1, %0" : "=g" (u64) : "Jc" (cShift), "0" (u64));

or

static inline uint64_t CC_ROR64(uint64_t word, int i)
{
    __asm__("rorq %%cl,%0"
        :"=r" (word)
        :"0" (word),"c" (i));
    return word;
}
answered on Stack Overflow Mar 29, 2019 by Zibri • edited Mar 30, 2019 by Zibri

User contributions licensed under CC BY-SA 3.0