ASM to C: how to dereference a pointer and add an offset?

2

I feel kind of dumb, but I'm struggling with dereferencing a pointer (+ adding an offset) in C. What I want to recreate in C is this behavior:

movabs rax, 0xdeadbeef
add rax, 0xa
mov rax, QWORD PTR [rax]

So at the end rax should be: *(0xdeadbeef+0xa) Especially the equivalent to mov rax, QWORD PTR [rax] would be improtant, as I need to use the calculated value and retrieve the data (=a different address) that is being stored at that point.

I tried so many things, but here is my current stage:

void *ptr = (void*)0xdeadbeef;
void *ptr2 = *(void*)(ptr+0xa);

Which translates to sth like this:

   0x7ffff7fe6050:      mov    QWORD PTR [rbp-0x38],rax
   0x7ffff7fe6054:      mov    rax,QWORD PTR [rbp-0x38]
   0x7ffff7fe6058:      add    rax,0xa

EDIT: It does not actually compile, I made a mistake with the provided C code here and can't figure out which code actually compiled to this. It's not that important anyways as the main target was the translation of ASM to C and the problem is solved now. Thanks everyone for participating.

So the first 2 lines are basically useless and just the value is added to my address and nothing more. I need it to be interpreted as an address and retrieve the value at that point though.

The data stored at those places doesn't matter at this point. Essentially what I want to do is find a specific value in memory and I know a way of adding offsets and dereferencing pointers to get to my goal. The final step will just be a typecast from my address to the actual datatype at that point.

I know this may seem trivial to some of you, but I'm not super familiar with C, so I'm struggling here...

c
pointers
assembly
x86-64
intel
asked on Stack Overflow Oct 26, 2018 by reijin • edited Oct 26, 2018 by reijin

1 Answer

3

You can simplify your asm to a single instruction, with the math done at assemble time. movabs rax, [0xdeadbeef + 0xa] can use the AL/AX/EAX/RAX-only form of mov that loads from a 64-bit absolute address (https://felixcloutier.com/x86/MOV.html). (It won't fit in a 32-bit sign-extended disp32, because the high bit of the low 32 is set, unlike normal static addresses in position-dependent code). Regular mov with a 32-bit address-size override would work, too, in about 7 bytes, because your address does fit in a zero-extended 32-bit integer.

In C you can also do the whole thing with a single statement. No need to overcomplicate things: your address is a pointer to a pointer, so you need to cast your integer to a x ** type.

void *ptr = *(const void**)(0xdeadbeefUL + 0xa);

In asm pointers are just integers, so it makes sense to do your math using integers instead of char*. Making it unsigned guarantees it zero-extends to pointer-width instead of sign-extending.

(Numeric literals in C have a type wide enough to represent the value, though, so 0xdeadbeef on an x86-64 compiler would be an int64_t (long long). You wouldn't actually get 0xdeadbeef being a negative 32-bit int that sign-extended to 0xffffffffdeadbeef.)

Since void doesn't have a size, you can't add / subtract integers to a void*. And pointer-math on void ** would be in chunks of sizeof(void*).

To avoid undefined behaviour from dereferencing a void** that's not aligned by 8 = alignof(void*) (in both mainstream x86-64 ABIs), you'd want to use memcpy. But I assume your example address is just a fake example. The mainstream x86 compilers like gcc don't do anything weird with unaligned addresses to punish programmers for UB, so the compiler output will contain unaligned loads which work fine on x86. But when auto-vectorizing you can run into problems from this kind of UB. Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?


But if you did for some reason want to break things up into multiple asm statements, you could transliterate it into multiple C statements like this:

uintptr_t wheres_the_beef = 0xdeadbeef;    // mov eax, 0xdeadbeef
wheres_the_beef += 0xa;                    // add eax, 0xa
void **address = (void**)wheres_the_beef;  // purely a cast, no asm instructions;
void *ptr = *address;                      // mov rax, [rax]

You could mess around with char* if you wanted to add byte offsets to pointers, but there's really no point here.

Again, this still has undefined behaviour on most C implementations, where alignof(void*) is greater than 1 so void **address = (void**)wheres_the_beef creates a misaligned pointer.

(Fun fact: even creating misaligned pointers is UB in ISO C. But all x86 compilers that support Intel's intrinsics must support creating of misaligned pointers for passing them to intrinsics like _mm_loadu_ps(), so only actually dereferencing them is a potential problem on x86 compilers.)

answered on Stack Overflow Oct 26, 2018 by Peter Cordes • edited Oct 26, 2018 by Peter Cordes

User contributions licensed under CC BY-SA 3.0