I tried three approaches to toggle a variable between two values in C, code is at:
void with_xor(int *value)
{
*value ^= VAL_ONE ^ VAL_TWO;
}
void with_conditional(int *value)
{
*value = (*value == VAL_ONE)? VAL_TWO : VAL_ONE;
}
void with_if(int *value)
{
if(VAL_ONE == *value)
{
*value = VAL_TWO;
}
else
{
*value = VAL_ONE;
}
}
I was expecting the "xor" approach to be much faster than the other two approaches, but seems like that is not the case. Why is that?
Here are the test results:
xor 0.052300
conditional 0.035738
if 0.034924
Here is a disassembly (compiled with no flags):
_with_xor:
0000000100000bd0 pushq %rbp
0000000100000bd1 movq %rsp,%rbp
0000000100000bd4 movq %rdi,0xf8(%rbp)
0000000100000bd8 movq 0xf8(%rbp),%rax
0000000100000bdc movl (%rax),%eax
0000000100000bde xorl $0x0f,%eax
0000000100000be1 movq 0xf8(%rbp),%rcx
0000000100000be5 movl %eax,(%rcx)
0000000100000be7 popq %rbp
0000000100000be8 ret
0000000100000be9 nopl 0x00000000(%rax)
_with_conditional:
0000000100000bf0 pushq %rbp
0000000100000bf1 movq %rsp,%rbp
0000000100000bf4 movq %rdi,0xf8(%rbp)
0000000100000bf8 movq 0xf8(%rbp),%rax
0000000100000bfc movl (%rax),%eax
0000000100000bfe cmpl $0x0a,%eax
0000000100000c01 jne 0x100000c0c
0000000100000c03 movl $0x00000005,0xf4(%rbp)
0000000100000c0a jmp 0x100000c13
0000000100000c0c movl $0x0000000a,0xf4(%rbp)
0000000100000c13 movq 0xf8(%rbp),%rax
0000000100000c17 movl 0xf4(%rbp),%ecx
0000000100000c1a movl %ecx,(%rax)
0000000100000c1c popq %rbp
0000000100000c1d ret
0000000100000c1e nop
_with_if:
0000000100000c20 pushq %rbp
0000000100000c21 movq %rsp,%rbp
0000000100000c24 movq %rdi,0xf8(%rbp)
0000000100000c28 movq 0xf8(%rbp),%rax
0000000100000c2c movl (%rax),%eax
0000000100000c2e cmpl $0x0a,%eax
0000000100000c31 jne 0x100000c3f
0000000100000c33 movq 0xf8(%rbp),%rax
0000000100000c37 movl $0x00000005,(%rax)
0000000100000c3d jmp 0x100000c49
0000000100000c3f movq 0xf8(%rbp),%rax
0000000100000c43 movl $0x0000000a,(%rax)
0000000100000c49 popq %rbp
0000000100000c4a ret
0000000100000c4b nopl 0x00(%rax,%rax)
First of all: The "conditional" and "if" variants of your code are functionally identical, and indeed the compiler is generating nearly identical code for both.
Branch prediction is likely to be affecting your results significantly here -- because you're always toggling the value between 5 and 10 and straight back, the branch in the conditional/if variant can be predicted with nearly 100% accuracy, turning the function into a store that can run almost-concurrently with the comparison. The XOR variant, on the other hand, must run as a load-xor-store sequence -- none of those three operations can run before the previous one finishes.
You generally can't pipeline xors on the CPU, but you can pipeline move instructions. So the XOR process is much slower.
Essentially, in your instruction pipeline, you want to execute as many instructions as possible. Since in general, branch prediction is pretty good, the CPU will be able to correctly resolve all those mov instructions and decide where everything ends up in parallel. On the other hand, it's impossible to XOR things before their predecessors have been XOR'd so the CPU has to wait for the previous XOR instruction to resolve before executing the next one. Much slower.
User contributions licensed under CC BY-SA 3.0