needs to understand the meaning behind 0x7fffffff and 0xffffffff80000000 in terms of memory address space layout

Question

needs to understand the meaning behind 0x7fffffff and 0xffffffff80000000 in terms of memory address space layout

# 0x00007f33caf5a85f: cmp rax, 0xffffffff80000000
# 0x00007f33caf5a865: jnl 0x7f33caf5a898
...
target_of_jnl:
# 0x00007f33caf5a898: cmp rax, 0x7fffffff
# 0x00007f33caf5a89e: jle 0x7f33caf5a8c8

The above code snip is part of an execution flow of function _M_extract_int() in libstdc++.

I don't understand the meaning of the two compares. I think 0xffffffff80000000 is the top physical memory address can be used for user mode. But how about 0x7fffffff? What are they checking?

assembly

x86-64

sign-extension

asked on Stack Overflow Oct 26, 2020 by

syacer • edited Oct 26, 2020 by

Peter Cordes

1 Answer

Note that 0x...865 and 0x...898 are not contiguous, even though they're shown in the question as part of one contiguous block with no blank line. One is the branch target of the other.

It appears to be checking for a value that can fit in a 32-bit 2's complement integer, i.e. between INT32_MIN and INT32_MAX (inclusive). Like x == (int32_t)x

But the branch targets are different for high-half and low-half, otherwise it could simply movsxd rdx, eax / cmp rax, rdx / je.

jnl is the same condition as jge, so it's jumping on (int64_t)rax >= INT_MIN, with INT_MIN of course sign-extended to 64-bit.

   cmp rax, 0xffffffff80000000                   # INT32_MIN
   jge   x_ge_INT32_MIN
# else fall-through: x < INT32_MIN
  ... other code here

x_ge_INT32_MIN:
   cmp rax, 0x7fffffff                           # INT32_MAX
   jle   x_fits_in_int32_t

  ... else it doesn't, x > INT32_MAX

Note that 0xffffffff80000000 represents a negative signed integer, (int64_t)INT32_MIN. It's the sign-extension of 32-bit 0x80000000.

I don't know what _M_extract_int() does and you didn't link any libstdc++ source for usage info.

I think 0xffffffff80000000 is the top physical memory address can be used for user mode

No, 0xffffffff80000000 is in the high half of virtual address space, and thus not already too high. See Address canonical form and pointer arithmetic for a diagram of x86-64 48-bit canonical virtual address-space.

Linux (and I think all mainstream x86-64 OSes) reserve the entire top half for the kernel's use, with user-space able to allocate / map pages in the low half. i.e. user-space can use the entire low 47 bits of virtual address space. (Maybe not the very bottom, e.g. Linux stops processes from mapping the low 64k by default, so nullptr dereferences, even with an offset, will still fault. MacOS reserves the entire low 4GiB.)

With 5-level page tables (Intel's PML5 extension), user-space can use the low 56 bits of the 57-bit virtual address space.

Either way, this is a similar thing to what this code is looking for, but with 48 or 57-bit values that are correctly sign-extended to 64-bit, not 32.

physical

nope. libstdc++ is used in user-space only, so the only addresses it will ever see are virtual.

answered on Stack Overflow Oct 26, 2020 by

Peter Cordes • edited Oct 26, 2020 by

Peter Cordes

User contributions licensed under CC BY-SA 3.0