GAS assembler not using 2-byte relative JMP displacement encoding (only 1-byte or 4-byte)

2

I am trying to write shellcode for a CTF challenge that does not allow for 0x00 bytes (it will be interpreted as a terminator). Due to restrictions in the challenge, I must do something like this:

[shellcode bulk]
[(0x514 - sizeof(shellcode bulk)) filler bytes]
[fixed constant data to overwrite global symbols]
[shellcode data]

It looks something like this

.intel_syntax noprefix
.code32

shellcode:
    jmp sc_data

shellcode_main:
    #open
    xor eax, eax
    pop ebx         //file string
    xor ecx, ecx    //flags
    xor edx, edx    //mode
    mov al, 5       //sys_OPEN
    int 0x80

    ...  // more shellcode

.org 514, 0x41     // filler bytes
.long 0xffffffff   // bss constant overwrite

sc_data:
    call shellcode_main
    .asciz "/path/to/fs/file"

This works beautifully if sc_data is within 127 bytes of shellcode. In this case the assembler (GAS) will output a short jump of format:

Opcode  Mnemonic
EB cb   JMP rel8

However, since I have a hard restriction that I need 0x514 bytes for the bulk shellcode and filler bytes, this relative offset will need at least 2-bytes. This would also work because there is a 2-byte relative encoding for the jmp instruction:

Opcode  Mnemonic
E9 cw   JMP rel16

Unfortunately, GAS does not output this encoding. Rather it uses the 4-byte offset encoding:

Opcode  Mnemonic
E9 cd   JMP rel32

This results in two MSB bytes of zeros. Something similar to:

e9 01 02 00 00

My question is: can GAS be forced to output the 2-byte variant of the jmp instruction? I toyed around with multiple smaller 1 byte jmps, but GAS kept outputting the 4-byte variant. I also tried invoking GCC with -Os to optimize for size, but it insisted on using the 4-byte relative offset encoding.

Intel jump opcode defined here for reference.

assembly
x86
gas
shellcode
machine-code
asked on Stack Overflow May 15, 2018 by sherrellbc • edited May 15, 2018 by Peter Cordes

1 Answer

4

jmp rel16 is only encodeable with an operand-size of 16, which truncates EIP to 16 bits. (The encoding requires a 66 operand-size prefix in 32 and 64-bit mode). As described in the instruction-set reference you linked, or in this more up-to-date PDF->HTML conversion of Intel's manual, jmp does EIP ← tempEIP AND 0000FFFFH; when the operand-size is 16. This is why assemblers never use it unless you manually request it1, and why you can't use jmp rel16 in 32 or 64-bit code except in the very unusual case where the target is mapped in the low 64kiB of virtual address space2.


Avoiding jmp rel32

You're only jumping forward so you can use call rel32 to push the address of your data, and because you want your data all the way at the end of your long padded payload.

You could construct a string on the stack with push imm32/imm8/reg and mov ebx, esp. (You already have a zeroed register you can push for the terminating zero byte).

If you don't want to construct data on the stack, and instead use data that's part of your payload, use position-independent code / relative addressing for it. Perhaps you have a value in a register that's a known offset from EIP, e.g. if your exploit code was reached with a jmp esp or other ret-2-reg attack. In that case, you might be able to just
mov ecx, 0x12345678 / shr ecx, 16 / lea ebx, [esp+ecx].

Or, if you had to use a NOP sled and you don't know the exact value of EIP relative to any register value, you can obtain the current value of EIP with a call instruction with a negative displacement. Jump forward over the call target, then call back to it. You can put data right after that call. (But avoiding zero bytes in the data is inconvenient; you can store some once you get a pointer to it.)

 # Position-independent 32-bit code to find EIP
 # and get label addresses into registers
 # and insert zeros into data that we jumped over.

               jmp  .Lcall

.Lget_eip:
               pop   ebx
               jmp   .Lafter_call       # jmp rel8
.Lcall:        call  .Lget_eip          # backward rel32 = 0xffffff??
          # execution never returns here
   .Lmsg:   .ascii "/path/to/fs/file/"    # last byte to be overwritten
   msglen = . - .Lmsg
   .Loffset_data2: .long .Ldata2 - .Lmsg   # relative offset to other data, or make this a 16-bit int to avoid zeros
               # max data size 127 - 5 bytes

.Lafter_call:
               # EBX = OFFSET .Lmsg just from the call + pop
               # Insert a zero at runtime because the data wasn't at the end of the payload
               mov  byte ptr [ebx+ msglen - 1], al   # with al=0


               # ESI = OFFSET .Ldata2 using an offset loaded from memory
               mov  esi, ebx
               add  esi, [ebx + .Loffset_data2 - .Lmsg]   # [ebx + disp8]

               # with an immediate displacement, avoiding zero bytes
               mov  ecx, ((.Ldata3 - .Lmsg) << 17) | 0xffff
               shr  ecx, 17                # choose shift count to avoid high zeros
               lea  edi, [ebx + ecx]       # edi = OFFSET .Ldata3

               # if disp8 doesn't work but 8 * disp8 does: small code size
               push  (.Ldata3 - .Lmsg)>>8   # push imm8
               pop   ecx
               lea   edi, [ebx + ecx*8 + (.Ldata3 - .Lmsg)&7]  # disp8 of the low 3 bits

           ...

  # at the end of your payload
  .Ldata2:
    whatever you want, arbitrary size

  .Ldata3:

In 64-bit code, it's much easier:

 # In 64-bit code

     jmp  .Lafter_data
 .Lmsg1:   .ascii "/foo/bar/"    # last bytes to be replaced
 .Lmsg2:   .ascii "/bin/sh/"
 .Lafter_data:
     lea  rdi, [RIP + .Lmsg1]            # negative rel32 
     lea  rsi, [rdi + .Lmsg2 - .Lmsg1]   # disp8
     xor  eax,eax
     mov  byte ptr [rsi - 1], al         # insert zeros
     mov  byte ptr [rsi + len], al

Or use a RIP-relative LEA to get a label address and use some zero-avoiding method to add an immediate constant to it to get the address of a label at the end of your payload.

  .Lbase:
      lea  rdi, [RIP + .Lbase]
      xor  ecx,ecx
      mov  cx, .Lpath - .Lbase
      add  rdi, rcx          # RDI = .Lpath address
      ...
      syscall

       ...   # more than 128 bytes
   .Lpath:
       .asciz "/foo/bar"

If you really needed to jump far, instead of just position-independent addressing of far-away "static" data.

A chain of short forward jumps would work.

Or use any of the above methods to find the address of a later label in a register, and use jmp eax.


Saving code bytes:

In your case, saving code size doesn't help you avoid long jump displacements, but probably for some other people it will:

You can save code bytes using these Tips for golfing in x86/x64 machine code:

  • xor eax,eax / cdq saves 1 byte vs. xor edx,edx.
  • xor ecx, ecx / mul ecx zeroes three registers in 4 bytes (ECX and EDX:EAX)
  • Actually, your best bet for that int 0x80 setup is probably
    xor ecx,ecx (2B) / lea eax, [ecx+5] (3B) / cdq (1B), and don't use mov al,5 at all. You can put arbitrary small constants in registers in only 3 bytes with push imm8 / pop, or with one lea if you have another register with a known value.

Footnote 1: asking your assembler to encode jmp rel16 outside of 16-bit mode:

NASM (in 16, 32 or 64-bit mode)

addr:
; times 256 db 0      ; padding to make it jump farther.
o16 jmp near addr     ; force 16-bit operand-size and near (not short) displacement

AT&T syntax:

objdump -d decodes it as jmpw: For the above NASM source assembled into a 32-bit static ELF binary, objdump -drwC foo shows the truncation of EIP:

0000000000400080 <addr>:
  400080:       66 e9 fc ff             jmpw   80 <addr-0x400000>

But GAS seems to think that mnemonic is only for indirect jumps (where it would mean a 16-bit load). (foo.S:5: Warning: indirect jmp without '*'), and this GAS source: .org 1024; addr: .zero 128; jmpw addr gives you

480:   66 ff 25 00 04 00 00    jmpw   *0x400   483: R_386_32   .text

See what is jmpl instruction in x86? - this insane inconsistency in how GAS handles AT&T syntax applies even to jmpl. Plain jmp 0x400 when assembling in 16-bit mode would be a relative jump to that absolute offset.

In the extremely unlikely case you wanted a jmp rel16 in other modes, you'd have to assemble it yourself with .byte and .short. I don't think there's even a way to get the assembler to emit it for you.


Footnote 2: You can't use jmp rel16 in 32/64-bit code, unless you're attacking some code mapped in the low 64kiB of virtual address space, e.g. maybe something running under DOSEMU or WINE. Linux's default setting for /proc/sys/vm/mmap_min_addr is 65536, not 0, so normally nothing can mmap that memory even if you want to, or presumably load its text segment at that address via the ELF program loader. (So NULL-pointer dereferences with an offset segfault instead of silently accessing memory).

You can be sure that your CTF target won't happen to be running with EIP = IP, and that truncating EIP to IP will just segfault.

answered on Stack Overflow May 15, 2018 by Peter Cordes • edited Jan 12, 2020 by Peter Cordes

User contributions licensed under CC BY-SA 3.0