I have been going through the IOS Kernel (What I hope is a snap of code of the kernel below) and struggle to understand. I know it is very complex and advanced but i wanted to dip my head in and see what it looks like. Are there any tips or resources on how i can understand what each line means? There are addresses, instructions and opcodes; how do you comprehend the values? Are there books or utilities that go through thoroughly to a beginner on how to understand? Thanks.
I have attempted to go through different resources to begin understanding but have been caught up with the amount of knowledge is required to comprehend the information. Where do I begin?
.-> 0x00000219 int1
,==< 0x0000021a loop 0x290 ; fcn.0000026c+0x24
|: 0x0000021c add byte [rax], dh
|: 0x0000021e add r9d, dword [rbp + 0x7306500f]
,===< 0x00000225 je 0x299
||: 0x00000227 out 0x69, eax ; 'i'
||: 0x00000229 outsb dx, byte [rsi]
||: 0x0000022a pop rdi
||: 0x0000022c wbinvd
||: 0x0000022e add dh, byte [rax - 0x37]
||: 0x00000231 add ch, byte [rsi]
||: 0x00000233 sub dword [rdx], eax
||: 0x00000235 cmpsd dword [rsi], dword ptr [rdi]
||: 0x00000236 pop rcx
||: 0x00000237 add al, 0xc1
||: 0x00000239 add ecx, dword [rcx + 0x51c0380]
||`=< 0x0000023f jl 0x219
|| 0x00000241 or esi, dword [rcx + rax + 0x6c5f736f]
|| 0x00000248 outsd dx, dword [rsi]
|| 0x00000249 scasb al, byte [rdi]
|| 0x0000024a add eax, 0x170e156e
|| 0x0000024f and eax, dword [rdi]
|| 0x00000251 sub dword [rdx], eax
|| 0x00000253 invalid
|| 0x00000254 sbb al, 0x10
|| 0x00000256 adc al, 0xe3
||,=< 0x00000258 jrcxz 0x260
||| 0x0000025a xchg eax, esp
||| 0x0000025b nop dword [rax]
||| 0x0000025e invalid
||| 0x0000025f invalid
||`-> 0x00000260 push r12
|| 0x00000263 mov edi, 0x4f435f41 ; 'A_CO'
|| 0x00000268 push rbx
|| 0x0000026a sbb eax, dword [rbx]
(The following code was obtained through /System/Library/Caches/com.apple.kernelcaches on a jailbroken IOS device) I would like some help into how you or others learn about ARM Assembly and how you got to understand the kernel in IOS and other OS machines.
want to learn some ARM assembly by examining and or manipulating compiled code?
Start much simpler. Use the optimizer to start so make sure it isnt dead code, in this case the inputs and the outputs are variables mostly so that the compiler cant remove them.
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a+b+5);
}
pre-built binaries are easy to come by for the major operating systems, or you can build from sources which isnt that difficult either.
arm-none-eabi-gcc -mthumb -mcpu=arm7tdmi -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: 3105 adds r1, #5
2: 1808 adds r0, r1, r0
4: 4770 bx lr
6: 46c0 nop ; (mov r8, r8)
...
go to infocenter.arm.com get the architectural reference manual (arm architecture, reference manuals, ARMv5 reference manual, download the pdf, look for the thumb instructions).
0: 3105 adds r1, #5
0x3105 0011000100000101
in this case the first 5 bits tell us which kind of add this is add an immediate to a register. rd = rd + immediate.
00110 001 00000101
00110 defines the instruction
001 r1
00000101 #5
the disassembler tacked on an s, adds which arm generally uses to indicate update the flags, the thumb version of this kind of add doesnt have a bit to choose to do the flags or not the full sized arm instruction does. unfortunately depending on which syntax you use in gnu assembler you cant put the s on.
.cpu arm7tdmi
.thumb
adds r1,#5
arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:3: Error: instruction not supported in Thumb16 mode -- `adds r1,#5'
.cpu arm7tdmi
.thumb
add r1,#5
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <.text>:
0: 3105 adds r1, #5
you can see the output of the compiler (assembly language which the compiler then sends to the assembler to become an object).
arm-none-eabi-gcc -save-temps -mthumb -mcpu=arm7tdmi -O2 -c so.c -o so.o
cat so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 1
.p2align 2,,3
.global fun
.arch armv4t
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type fun, %function
fun:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
adds r1, r1, #5
adds r0, r1, r0
@ sp needed
bx lr
.size fun, .-fun
.ident "GCC: (GNU) 8.2.0"
I find
00000000 <fun>:
0: 3105 adds r1, #5
2: 1808 adds r0, r1, r0
4: 4770 bx lr
6: 46c0 nop ; (mov r8, r8)
much easier to read and the pseudo code and other gnu assembler syntax is removed
we can see that the a+b+5 became take r1 which I happen to know is the b variable because you also need to learn the ABI or calling convention for this compiler with these settings for this target.
so it did
b + b + 5
a = a + b
return.
the calling convention dictates the return is in r0, a was passed in in r0 so they needed to add b and 5 to it, there is more than one way they could have done this.
look up the bx instruction and lr. Look up the bl instruction and what it does to set lr. lr means link register it is r14.
folks have gotten very lazy and the pre-builts probably are much worse. I use this patch to binutils to make it more sane.
--- binutils-patch/opcodes/arm-dis.c
+++ binutils-patch/opcodes/arm-dis.c
@@ -3030,7 +3030,7 @@
};
/* Default to GCC register name set. */
-static unsigned int regname_selected = 1;
+static unsigned int regname_selected = 2;
#define NUM_ARM_REGNAMES NUM_ELEM (regnames)
#define arm_regnames regnames[regname_selected].reg_names
which disassembles using:
{ "reg-names-std", N_("Select register names used in ARM's ISA documentation"),
{ "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "sp", "lr", "pc" }},
gcc/gas relatively recently (in the grand scheme of gnu and arm) switched to the names they use now.
{ "reg-names-gcc", N_("Select register names used by GCC"),
{ "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "sl", "fp", "ip", "sp", "lr", "pc" }},
which is not as bad as this mips influenced ____fill in the blank____....
{ "reg-names-apcs", N_("Select register names used in the APCS"),
{ "a1", "a2", "a3", "a4", "v1", "v2", "v3", "v4", "v5", "v6", "sl", "fp", "ip", "sp", "lr", "pc" }},
makes it much easier if you have r0 to r15 to go back and forth to the documentation with...
unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b )
{
return(more_fun(x<<1,b+3)+6);
}
get a little more complicated
arm-none-eabi-gcc -mthumb -mcpu=arm7tdmi -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 3103 adds r1, #3
4: 0040 lsls r0, r0, #1
6: f7ff fffe bl 0 <more_fun>
a: 3006 adds r0, #6
c: bc10 pop {r4}
e: bc02 pop {r1}
10: 4708 bx r1
12: 46c0 nop ; (mov
now without reading the abi we can see that for this function prototype r0 is the first parameter, r1 the second, and the return is in r0. This is a disassembled object, the linker will fill in the rest of the bl bits, the offset is bogus right now for that instruction, but you can look up all of these instructions and see how they implement that C code. It is not a bug nor accidental that the pops worked that way, this is documented in the arm documentation as to what happens if they had pop {r4,pc} instead or if you pop {r1,r4}; bx r1 why that doesnt work in this case, etc. why you cant pop {r4,lr}; bx lr and so on. (with the armv6-m and/or armv7-m architectural reference manuals you can then see why pop {r4,pc} works.
arm-none-eabi-gcc -mthumb -march=armv6-m -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b510 push {r4, lr}
2: 3103 adds r1, #3
4: 0040 lsls r0, r0, #1
6: f7ff fffe bl 0 <more_fun>
a: 3006 adds r0, #6
c: bd10 pop {r4, pc}
e: 46c0 nop ; (mov r8, r8)
as you can see without needing any hardware nor a simulator you can entertain and educate yourself on mastering the tools and the languages. (The nops at the end where shown are simply to align the next thing to a word boundary, its not a branch shadow thing).
Also understand this doesnt work this well with x86, even the gnu disassembler struggles at times, so maybe wading through the compiler output rather than the disassembly.
At least with the latest gnu gcc compiler versions for the armv7 based processors they will use a mix of thumb2 and arm instructions (thumb extensions and the full sized arm instructions)
arm-none-eabi-gcc -mthumb -march=armv7 -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b508 push {r3, lr}
2: 3103 adds r1, #3
4: 0040 lsls r0, r0, #1
6: f7ff fffe bl 0 <more_fun>
a: 3006 adds r0, #6
c: bd08 pop {r3, pc}
isnt that just awesome that for one they use r4 as the fill register to make the stack aligned on a 64 bit boundary and the other uses r3...
we didnt see any thumb2 here, without the -mthumb we get thumb for this architecture for this version of gcc.
arm-none-eabi-gcc -march=armv7 -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: b508 push {r3, lr}
2: 3103 adds r1, #3
4: 0040 lsls r0, r0, #1
6: f7ff fffe bl 0 <more_fun>
a: 3006 adds r0, #6
c: bd08 pop {r3, pc}
e: bf00 nop
but for this one
arm-none-eabi-gcc -march=armv4t -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e2811003 add r1, r1, #3
8: e1a00080 lsl r0, r0, #1
c: ebfffffe bl 0 <more_fun>
10: e8bd4010 pop {r4, lr}
14: e2800006 add r0, r0, #6
18: e12fff1e bx lr
more bits to decode than thumb, takes more work to understand each instruction as they are more flexible per instruction than thumb. note that since we are not actually using the flags add and lsl do not have the s at the end. Go look these up in the arm instructions. being armv4t like thumb you cant pop the pc as that doesnt support switching modes. bx is preferred to mov pc,lr because the tools want to make each function so they can be mixed thumb and arm in the same program. pop {pc} for armv4t doesnt support that. this is in the documentation, since they didt the return that way as two instructions then then chose to put the add r0 in between.
arm-none-eabi-gcc -marm -march=armv7-a -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e2811003 add r1, r1, #3
8: e1a00080 lsl r0, r0, #1
c: ebfffffe bl 0 <more_fun>
10: e2800006 add r0, r0, #6
14: e8bd8010 pop {r4, pc}
look at how much you can learn from such a trivial function. if running on hardware you can then take this assembly, put it in a file (the assembly itself, and try to build it and have the object disassemble the same way) and then try to link it into a project and see if it behaves the same as the compiled object from the high level language.
well down the road you can take a good sized chunk of code and try to read it cold and figure out what it is doing.
User contributions licensed under CC BY-SA 3.0