Consider this program, which can be compiled as either 32-bit or 64-bit:
#include <stdio.h>
static int f(int x, int y) {
    __asm__(
        "shrl $4, %0\n\t"
        "movl %1, %%edx\n\t"
        "addl %%edx, %0"
        : "+r"(x)      // needs "+&r" to work as intended
        : "r"(y)
        : "edx"
    );
    return x;
}
int main(void) {
    printf("0x%08X\n", f(0x10000000, 0x10000000));
}
At -O1 or higher, it gives the wrong answer (0x02000000 instead of 0x11000000), because x gets written before y gets read, but the constraint for x doesn't have the & to specify earlyclobber, so the compiler put them in the same register. If I change +r to +&r, then it gives the right answer again, as expected.
Now consider this program:
#include <stdio.h>
static int f(int x, int y) {
    __asm__(
        "shrl $4, %0\n\t"
        "movl %1, %%edx\n\t"
        "addl %%edx, %0"
        : "+m"(x)        // Is this safe without "+&m"?  Compilers reject that
        : "m"(y)
        : "edx"
    );
    return x;
}
int main(void) {
    printf("0x%08X\n", f(0x10000000, 0x10000000));
}
Other than using m constraints instead of r constraints, it's exactly the same. Now it happens to give the right answer even without the &. However, I understand relying on this to be a bad idea, since I'm still writing to x before I read from y without telling the compiler I'm doing so. But when I change +m to +&m, my program no longer compiles: GCC tells me error: input operand constraint contains '&', and Clang tells me invalid output constraint '+&m' in asm. Why doesn't this work?
I can think of two possibilities:
& is rejected as redundant& is rejected as unsatisfiableIs one of those the case? If the latter, what's the best workaround? Or is something else going on here?
 Joseph Sible-Reinstate Monica • edited Jun 8, 2020 by
 Joseph Sible-Reinstate Monica • edited Jun 8, 2020 by  Peter Cordes
 Peter CordesI think "+m" and "=m" are safe without an explicit &.
From the docs, my emphasis added:
&
Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.
Over-interpreting this could be problematic, but given the fact that it seems safe in practice, and there are good reasons why that should be the case, I think the following interpretation of the docs (i.e. guaranteed behaviour for GCC) is reasonable:
"Memory address" is talking about the addressing mode itself, e.g. something like 16(%rdx), that GCC invents and substitutes in for %1 if you have a "m"(foo) memory operand for example.  It's not talking about early-clobbering pointed-to memory, only registers that might be read as part of the addressing mode.
It means GCC needs to avoid picking the same register in any addressing mode as it picked for an early-clobber register operand.  This lets you safely use "m"  operands (and +m or =m) in the same statement as an "=&r" operand, just like you can use "r" operands.  It's the register output operand that needs to be flagged with &, not the potential readers.
The fact that it explicitly says in a register implies that this is only a concern at all for register operands, not memory.
In the C abstract machine, every object has a memory address (except register int foo).
I think compilers will always pick that address for "m" / "+m" operands, not some invented temporary.  For example, I think it's safe / supported to lea that memory operand and store the address somewhere, if it would be safe to to tmp = &foo; in C.
You can think of "earlyclobber" as "don't pick the same location as any input operand". Since different objects have different addresses, that already happens for free for memory.
Unless you specified the same object for separate input and output operands, of course.  In the register case for "=&r"(foo) and "r"(foo) you would get separate registers for the input and result.  But not for memory, even if you use an early-clobber "=&m"(foo) operand, which does compile even though "+&m" doesn't.
Random facts, experiments on Godbolt:
"m"(y+1) doesn't work as an input: "memory input 1 is not directly addressable".  But it works for a register.  Memory source operands may have to be objects that exist in the C abstract machine.
"+&m"(x) doesn't compile: error: input operand constraint contains '&'
"=&m"(x) compiles cleanly.  However, a "0"(x) matching constraint for it gets a warning: warning: matching constraint does not allow a register. https://godbolt.org/z/4kKNq4.  
+ operands appear to be internally implemented as separate output and input operands with a matching constraint to make sure they pick the same location.  (More evidence: if you use just one "+r" operand, you can reference %1 in the asm template without a warning, and it's the same register as %0.)
It appears that "=&m"(x) and "m"(x) will always pick the same memory anyway, even without a matching constraint.  (For the same reason that it's not the same memory as any other object, which is why "+&m"(x) is redundant.)
If the lifetimes of two C objects overlap, their addresses will be distinct. So I think this works just like passing pointers to locals to a non-inline function, as far as the optimizer is concerned. It can't invent aliasing between them. e.g.
  int x = 1;
  {
    int tmp = x;     // dead after this call.
    foo(&x, &tmp);
  }
For example, the above code can't pass the same address for both operands of foo (e.g. by optimizing away tmp).  Same for an inline-asm statement with "=m(x)" and "m"(tmp) operands.  No early-clobber needed.
A lot of this reasoning is extrapolated from how one would reasonably expect it to work, but that is consistent with how it appears to work in practice and with the wording in the docs. I mention this as a caution against applying the same reasoning without any support from the docs for other cases.
Re: point 2: Even if early-clobber were necessary, it would always be satisfiable for memory. Every object has its own address. It's the programmer's fault if you pass overlapping union members as memory inputs and outputs. The compiler won't create that situation if it wasn't present in the source. e.g. it won't elide a temporary variable if it would mean that a memory input overlaps a memory output. (Or at all).
 Peter Cordes
 Peter CordesUser contributions licensed under CC BY-SA 3.0