What did the compiler and run-time system really do in my generated assembly?

Question

What did the compiler and run-time system really do in my generated assembly?

I would like to understand how the generated assembly and the runtime work together, and came across a question while stepping through some generated assembly code.

Source Example

Here are three lines of Objective-C, running in XCode 4.5:

// Line 1:
NSObject *obj1 = [[NSObject alloc] init];

// Line 2:
[obj1 release];

// Line 3:
NSObject *obj2;

Comparing the Generated Assembly

Stepping through the generated assembly, I made a few observations.

Before line 1, the address of obj1 is as shown:

obj1    (NSObject*) 0x00003604

After line 1, it changes:

obj1    NSObject *  0x08122110

Observations

1) The address of obj1 was changed. When the source code is compiled, the compiler allocates temporarily memory for obj1. Then, (after line 1) the compiler apparently re-allocates, so the object's address changes.

2) After line 2, the address of obj2 is still the same (0x08122110)! When I call [obj1 release], I am telling the compiler: "I don't need this anymore. Please take it away." But the system is actually doing the release at some point in the future and I can not seem to control it directly.

3) The debugger can't step over line 3. I don't understand why it won't!

Question

In terms of creating and destroying objects, what is the compiler actually doing with these lines of code (specifically an "alloc-init", a release, and an NSObject pointer declaration without an assignment)? Also, why won't the debugger let me step over the third line? Can the debugger not see it?

Along with an answer, if you can please recommend some documents or a book about what the compiler and run-time system really do, I would appreciate it. Thank you very much!

objective-c

compiler-construction

runtime

asked on Stack Overflow Apr 9, 2013 by

DungProton • edited Apr 9, 2013 by

keparo

2 Answers

The pointer called obj1 is created on the stack. It is not initialized which means that it will contain anything that was in that memory location. This is a constant source of bugs since using an uninitialized pointer can lead to unspecified behavior. Once the object is allocated the pointer is initialized with its address.
The address does not change because the pointer is not updated. When the -release message is sent to the object the retain counter is usually reduced by one. If the retain counter is already at one the -dealloc method is called and the memory is marked as free. Only the memory that the pointer points to is marked as free, but the pointer remains the same. That's why some prefer to also set their pointers to nil once they don't need them anymore.
You're creating an uninitialized pointer. Since it's not initialized it will reuse the data that was already at the memory location where the pointer is stored.

About the book recommendation. I would recommend Compilers: Principles, Techniques, and Tools.

answered on Stack Overflow Apr 9, 2013 by (unknown user) • edited Apr 9, 2013 by (unknown user)

Marcus's answer is quite good, but here are some more details (I'd been meaning to brush up reading generated assembly; having to actually try and explain it is the best way).

NSObject *obj1 = [[NSObject alloc] init]; // Line 1

The compiler compiles two function calls to objc_msgSend(). The first calls the +alloc method on the NSObject class. The result of that function call becomes the first argument -- the target object -- of the second function call which calls the method -init.

The result of calling init is then stored on the stack in a chunk of memory you have declared as being named obj1 which has type of a pointer to an instance of NSObject.

You can step through this line in the debugger because there is an executed expression on the line. If the code were written as:

NSObject *obj1; // declaration
obj1 = [[NSObject alloc] init];

Then you would find that you can't step through declaration.

Before obj1 = [[NSObject alloc] init];, the value ofobj1is *undefined* under Manual Retain Release, but **will be automatically set tonil` (0) under ARC** (thereby eliminating the source of bugs Marcus indicated).

[obj1 release]; // Line 2

This line invokes the release method on the instance of NSObject pointed to by obj1.

NSObject *obj2; // Line 3

This line effectively does nothing. If the compiler's optimizer were turned on, there would be no code generated at all. Without the optimizer, the compiler may bump the stack pointer by sizeof(NSObject*) to reserve space on the stack with the name obj2.

And, again, you can't step through it in the debugger because there is no expression to execute on that line.

Of note, you could rewrite the code as:

[[[NSObject alloc] init] release];

That would be effectively identical to the original code you wrote as far as execution is concerned. Without the optimizer, it will be a little bit different in that it won't store anything on the stack. With the optimizer, it is likely to generate identical code as to your original code. The optimizer is quite good at eliminating local variables when they aren't needed (which is also partially why debugging optimized code is so hard).

Given this:

(11) void f()
(12) {
(13)    NSObject *obj1 = [[NSObject alloc] init]; // Line 1
(14)    
(15)    [obj1 release]; // Line 2
(16)    
(17)    NSObject *obj2; // Line 3
(18)}

This is the unoptimized x86_64 assembly. Ignore the "fixup" stuff. Look at the callq lines; they are the actual calls to objc_msgSend() as described above. On x86_64, %rdi -- a register -- is argument 0 for all function calls. Thus, %rdi is where the target of method calls goes. %rax is the register used for return values.

So, when you see a callq, followed by movq %rax, %rdi, followed by another callq, that says "take the return value of the first callq and pass it as the first argument to the next callq.

As for your variables, you'll see things like movq %rax, -8(%rbp) after the callq. This says "take whatever was returned by the callq, write it to the current spot on the stack, then move the stack pointer down 8 locations (the stack grows down)". Unfortunately, the assembly doesn't show you the variable names.

_f:                                     ## @f
    .cfi_startproc
Lfunc_begin0:
    .loc    1 12 0                  ## /tmp/asdfafsd/asdfafsd/main.m:12:0
## BB#0:
    pushq   %rbp
Ltmp2:
    .cfi_def_cfa_offset 16
Ltmp3:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp4:
    .cfi_def_cfa_register %rbp
    subq    $32, %rsp
    leaq    l_objc_msgSend_fixup_release(%rip), %rax
    leaq    l_objc_msgSend_fixup_alloc(%rip), %rcx
    .loc    1 13 0 prologue_end     ## /tmp/asdfafsd/asdfafsd/main.m:13:0
Ltmp5:
    movq    L_OBJC_CLASSLIST_REFERENCES_$_(%rip), %rdx
    movq    %rdx, %rdi
    movq    %rcx, %rsi
    movq    %rax, -24(%rbp)         ## 8-byte Spill
    callq   *l_objc_msgSend_fixup_alloc(%rip)
    movq    L_OBJC_SELECTOR_REFERENCES_(%rip), %rsi
    movq    %rax, %rdi
    callq   _objc_msgSend
    movq    %rax, -8(%rbp)
    .loc    1 15 0                  ## /tmp/asdfafsd/asdfafsd/main.m:15:0
    movq    -8(%rbp), %rax
    movq    %rax, %rdi
    movq    -24(%rbp), %rsi         ## 8-byte Reload
    callq   *l_objc_msgSend_fixup_release(%rip)
    .loc    1 18 0                  ## /tmp/asdfafsd/asdfafsd/main.m:18:0
    addq    $32, %rsp
    popq    %rbp
    ret
Ltmp6:
Lfunc_end0:

For giggles, have a look at the assembly generated with the optimizer turned on (-Os -- fastest, smallest, the default for deployed code):

The first thing to note -- and this gets back to question (3) -- is that there is no manipulation of %rbp outside of the very first and very last instructions. That is, nothing is pushed onto or pulled off the stack; quite literally, there is no evidence that obj1 and obj2 were ever declared because the compiler didn't need them to generate equivalent code.

Everything is done via registers and you'll note that there is two move %rax, %rdi. The first is "take the result of the +alloc and use it as the first argument to the call to -init" and the second is "take the result of the -init and use it as an argument to -release.

Aside; %rsi is where the second argument to function calls resides on x86_64. For method calls -- for calls to the objc_msgSend() function -- that argument will always contain the name of the method (the selector) to be called.

Lfunc_begin0:
    .loc    1 12 0                  ## /tmp/asdfafsd/asdfafsd/main.m:12:0
## BB#0:
    pushq   %rbp
Ltmp2:
    .cfi_def_cfa_offset 16
Ltmp3:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp4:
    .cfi_def_cfa_register %rbp
    .loc    1 13 0 prologue_end     ## /tmp/asdfafsd/asdfafsd/main.m:13:0
Ltmp5:
    movq    L_OBJC_CLASSLIST_REFERENCES_$_(%rip), %rdi
    leaq    l_objc_msgSend_fixup_alloc(%rip), %rsi
    callq   *l_objc_msgSend_fixup_alloc(%rip)
    movq    L_OBJC_SELECTOR_REFERENCES_(%rip), %rsi
    movq    %rax, %rdi
    callq   *_objc_msgSend@GOTPCREL(%rip)
    .loc    1 15 0                  ## /tmp/asdfafsd/asdfafsd/main.m:15:0
    leaq    l_objc_msgSend_fixup_release(%rip), %rsi
    movq    l_objc_msgSend_fixup_release(%rip), %rcx
    movq    %rax, %rdi
    popq    %rbp
    jmpq    *%rcx  # TAILCALL
Ltmp6:
Lfunc_end0:

If you want to learn even more about method dispatch, I wrote a bit of a guide. It is a couple of versions of objc_msgSend() out of date, but still relevant.

Note that ARM code works the same way philosophically, but the generated assembly will be a bit different and quite a bit more of it.

I can't still understand why I can't step over line 3 ^^

If you look at the generated assembly, there is nothing generated for the variable declarations. At least not directly. The closest would be movq %rax, -8(%rbp) which moves the result of the init into , but that is after the two function calls.

For NSObject *obj2;, the compiler doesn't generate any code. Not even with the optimizer disabled.

That is because a variable declaration is not an expression; it doesn't actually do anything other than provide a label for you -- the developer -- to use to hold values. It is only when you actually use the variable that there is code generated.

Thus, when you are stepping in the debugger, it skips that line because there is nothing to do.

answered on Stack Overflow Apr 9, 2013 by

bbum • edited Apr 9, 2013 by

bbum

User contributions licensed under CC BY-SA 3.0