Is "jr $ra" required to end a MIPS assembly language program? (MARS & QtSpim behave differently!)

2

If you put a jr $ra at the end of a MIPS assembly language program in MARS, you will get an error message saying:

invalid program counter value: 0x00000000

Example 1 below fails:

.data

theMsg: .asciiz "Hello World\n"

.text
.globl main

main:   li $v0, 4       
        la $a0, theMsg  
        syscall         
        
        jr $ra

      

Example 2 below works:

.data

theMsg: .asciiz "Hello World\n"

.text
.globl main

main:   li $v0, 4       
        la $a0, theMsg  
        syscall

     
    

MARS says "program is finished running (dropped off bottom)" but there are no error messages.

Now, if you run Example 2 in QtSpim, you will get an error saying:

Attempt to execute non-instruction at 0x00400030

If you run Example 1 in QtSpim, it works just fine.

Can anyone shed some light on this?
Which MIPS simulator is right?

assembly
mips
mars-simulator
qtspim
asked on Stack Overflow Feb 5, 2021 by Chris • edited Feb 6, 2021 by Sep Roland

2 Answers

4

The standard works-everywhere way is making an exit(0) system call (http://courses.missouristate.edu/kenvollmar/mars/help/syscallhelp.html).

   li $v0, 10         # call number
   li $a0, 0          # arg
   syscall            # exit(0)

That also avoids having to save the incoming $ra anywhere in main if you want to use jal inside your program, so it's convenient.

That's also more "realistic" for real-world hand-written asm programs running in a mainstream OS like Linux, not just the "toy" system that MARS/SPIM simulate.


In MARS, apparently dropping off the bottom is a valid option for the "toy" system that it simulates. That never works in any real-hardware CPU, though; there's always something next in memory and the CPU will try to fetch and execute it1.

Neither MARS nor SPIM are trying to emulate a real OS like Linux, just provide their own specific environment2. The systems that MARS vs. SPIM simulate have some minor differences from each other, including the one you found.

Neither one is right or wrong, just different: there is no real-world environment that they're trying to match / emulate.

SPIM might even have an option to include some kernel code or something like that in the simulated system's memory, IIRC. I may be misremembering, but if not then some of the syscall handling might actually be done by more MIPS code, coming closer to a real MIPS CPU running an OS. (As opposed to MARS where the system-call implementation is purely in Java inside the simulator that you're calling into via syscall, not in terms of MIPS instructions and device drivers for simulated hardware.)

Under a real OS (e.g. GNU/Linux with gcc and glibc), main would be a proper function called normally from the _start process entry point (indirectly via __libc_start_main to do some more init stuff before actually calling main). _start is the real process entry point, first instruction that runs in user-space (modulo dynamic linking), and is not a function (no return address anywhere); your only option is to make an exit system call (or crash or keep running forever). When main returns, _start passes its return value (an int thus in $v0) as an arg to the exit library function which does cleanup stuff like flushing stdio buffers, then makes an _exit system call.

Apparently SPIM intends their main label to be like a C main function, or at least it gets a valid return address. IDK if it gets int argc and char *argv[] in $a0 and $a1.

For jr $ra to work, SPIM must be setting the initial $ra to some address, like your main was called from somewhere. You'll probably find code that which copies $v0 to $a0, then makes an exit system call.

Some people do confusingly use main as a name for entry-points that can't return, unfortunately, I think even in real-world embedded development. In standard toolchains for GNU/Linux systems (gcc / clang), the process entry point is by default called _start.

main is confusing because it's the standard name for a C function (called by asm startup stuff), which is allowed to return. Something you can't return from isn't a function, but in C, main definitely is a function. C is the low-level language of Unix/Linux systems programming, and many other languages build on top of that standard toolchain of libc and CRT startup code.


Footnote 1: Most ISAs have rules for how PC can wrap from 0xffffffc to 0 or whatever, so even putting your code at the very end of the address space can't make it stop by itself when reaching the end. Or if it does, it will be some kind of fault, not exiting to the OS. (In this case the MARS or SPIM are acting as the OS, handling the syscall instructions you run among other things). Note that an actual OS on bare metal has no way to "exit", only reset or power-off the machine. It's not running "under" anything it can exit back to.

Footnote 2: With very limited system calls, e.g. no cursor movement, and some syscalls which do things that library functions (not syscalls) would do in a real system, e.g. int<->string conversion. But MARS/SPIM only provide that as part of I/O, no atoi or sprintf(buf, "%d", num). This is why the "toy" label applies, especially to the set of system calls they provide, which is very much not like Linux's set of system calls.

But also to stuff like the simple bitmap graphics MARS has, and to the no-branch-delay default option both MARS and SPIM default to. Real MIPS CPUs have a branch-delay slot, until MIPS32r6 re-arranged the opcodes and provided new no-delay-slot branch instructions.

MARS at least (and maybe SPIM) have pretty limited support for assemble-time constants in their built-in assembler as well, e.g. you can't do .equ or msglen = . - msg to compute a length of a msg: .ascii "hello" at assemble time like you could in GNU assembler for MIPS.

answered on Stack Overflow Feb 5, 2021 by Peter Cordes • edited Feb 6, 2021 by Peter Cordes
4

To add to @Peter's very good answer:

SPIM has the option to include kernel code, via Simulator->Settings->Load Exception Handler (you can pick a file or use the default), the handler is assembly source.  The default setting is to use the default handler (as opposed to using no handler).

When you write such a handler, you can include code in .ktext & .kdata, but also include user .text & .data.  Any exception handler is assembled and loaded first before the user code.

The standard exception handler file includes — for placement in user .text — loading of argc/argv and then does jal main followed by syscall #10 (so it is a bit like _start in crt0), which means we can return (jr $ra) to that startup code.  This is why user code appears at [00400020] in SPIM whereas in MARS your user code starts at 00400000.

SPIM also does not report missing symbols until runtime!!! So, if main is not found then the missing symbol main is reported when the jal main executes.

However, when you do have a valid main symbol, it does not have to be first in the file — it can be anywhere.

Though MARS also starts execution at the beginning of .text, by contrast, the default exception handler in MARS offers no _start equivalent, so we have to start assembly programs with main code (but we don't really need a main symbol) or else put a j somewhere there.  SPIM will behave more like MARS if you forgo the use of the default handler.

answered on Stack Overflow Feb 6, 2021 by Erik Eidt • edited Feb 6, 2021 by Erik Eidt

User contributions licensed under CC BY-SA 3.0