I'm making a x86 asm to C code emulator.. and for my parser I am up to the bracket parsing
QWORD PTR [
to ]
DWORD PTR [
to ]
WORD PTR [
to ]
BYTE PTR [
to ]
MOV X, [
to ]
LEA X, [
to ]
For now I will ignore..
MMWORD PTR[]
XMMWORD PTR[]
FWORD PTR []
TBYTE PTR []
I want to know what are all the possible mathematical arithmetic can that be placed in the brackets
Most complex I have encountered is..
[EBP+ECX*4-E0]
The reason I have to parse is to convert E0
to 0x000000E0
then 4
to 0x00000004
As far as I know +
, -
, *
are possible is \
and or /
possible too, or how about dots? .
?
I figure best way to split every instruction which contains brackets []
to get inner math.
Then split the inner math by 1 char delimiters +-*
.
I want to make sure I get them all? is division ever possible in these? or not how about XOR/OR/AND/NOT ?
Maybe these tables for to build a Mod R/M-Byte is helpfull to show wich combination is possible for to build an address with the 16 bit and 32 bit base and index registers.
Format of Postbyte(Mod R/M from Intel)
--------------------------------------
MM RRR MMM
MM - Memeory addressing mode
RRR - Register operand address
MMM - Memoy operand address
RRR Register Names
Filds 8bit 16bit 32bit
000 AL AX EAX
001 CL CX ECX
010 DL DX EDX
011 Bl BX EBX
100 AH SP ESP
101 CH BP EBP
110 DH SI ESI
111 BH DI EDI
---
16bit memory (No 32 bit memory address prefix)
MMM Default MM Field
Field Sreg 00 01 10 11=MMM is reg
000 DS [BX+SI] [BX+SI+o8] [BX+SI+o16]
001 DS [BX+DI] [BX+DI+o8] [BX+DI+o16]
010 SS [BP+SI] [BP+SI+o8] [BP+SI+o16]
011 SS [BP+DI] [BP+DI+o8] [BP+DI+o16]
100 DS [SI] [SI+o8] [SI+o16]
101 DS [DI] [DI+o8] [SI+o16]
110 SS [o16] [BP+o8] [BP+o16]
111 DS [BX] [BX+o8] [BX+o16]
Note: MMM=110,MM=0 Default Sreg is DS !!!!
32bit memory (Has 67h 32 bit memory address prefix)
MMM Default MM Field
Field Sreg 00 01 10 11=MMM is reg
000 DS [EAX] [EAX+o8] [EAX+o32]
001 DS [ECX] [ECX+o8] [ECX+o32]
010 DS [EDX] [EDX+o8] [EDX+o32]
011 DS [EBX] [EBX+o8] [EBX+o32]
100 SIB [SIB] [SIB+o8] [SIB+o32]
101 SS [o32] [EBP+o8] [EBP+o32]
110 DS [ESI] [ESI+o8] [ESI+o32]
111 DS [EDI] [EDI+o8] [EDI+o32]
Note: MMM=110,MM=0 Default Sreg is DS !!!!
---
SIB is (Scale/Base/Index)
SS BBB III
Note: SIB address calculated as:
<sib address>=<Base>+<Index>*(2^(Scale))
Fild Default Base
BBB Sreg Register Note
000 DS EAX
001 DS ECX
010 DS EDX
011 DS EBX
100 SS ESP
101 DS o32 if MM=00 (Postbyte)
SS EBP if MM<>00 (Postbyte)
110 SS ESI
111 DS EDI
Fild Index
III register Note
000 EAX
001 ECX
010 EDX
011 EBX
100 never Index SS can be 00
101 EBP
110 ESI
111 EDI
Fild Scale coefficient
SS =2^(SS)
00 1
01 2
10 4
11 8
What's inside the brackets are address expressions. The Intel x86 family of processors supports certain address operations, like having a base register, adding an offset, and scaling with 2,4 or 8. Some assemblers allow dotted references to fields in structures as part of the base offset expression. Except for calculating the base offset, the 'math' inside the brackets is not math done at assembly time but the encoding of the address portion of the instruction to be calculated at run-time.
You need to read the Intel Software Developer Manuals. Specifically, section 3.7.5 "Specifying an Offset", which tells us there are two ways of doing so:
Base + (Index * Scale) + Displacement
RIP + Displacement
(64 bit mode only)Then, examine the instruction set reference to find what are the possibilities for each instruction.
Alternatively, you can consult existing implementations of various disassemblers and emulators (e.g. distorm) or other projects documenting this exact thing in more convenient forms (e.g. corkami)
Check out the Intel manual located at the bottom of the Wikipedia page since that should contain all of the various addressing modes supported, plus it will give you the look and function of all the other instructions.
User contributions licensed under CC BY-SA 3.0