iretq throwing GP fault

2

I'm trying to write a 64 bit OS. It throws a GP on iretq from the timer interrupt handler, then repeatedly throws more GPs from the iretq of the GP handler.

I know this because my generic handler prints the ISR number on the serial port, and it goes 32, 13, 13, 13, ...

The error code for the GP is 10, which is my data segment.

I'm debugging it in qemu, so I can see quite a bit. Here's the situation at the iretq from the timer handler:

(gdb) disas isr_common,isr_head_2                                                
Dump of assembler code from 0x8189 to 0x81c4:                                    
   0x0000000000008189 <isr_common+0>:   callq  0x8125 <sayN100>                  
   0x000000000000818e <isr_common+5>:   cmp    $0x20,%eax                        
   0x0000000000008191 <isr_common+8>:   jl     0x81a8 <isr_common.no_more_acks>  
   0x0000000000008193 <isr_common+10>:  cmp    $0x30,%eax                        
   0x0000000000008196 <isr_common+13>:  jge    0x81a8 <isr_common.no_more_acks>  
   0x0000000000008198 <isr_common+15>:  cmp    $0x28,%al                         
   0x000000000000819a <isr_common+17>:  jl     0x81a2 <isr_common.ack_master>    
   0x000000000000819c <isr_common+19>:  push   %rax                              
   0x000000000000819d <isr_common+20>:  mov    $0x20,%al                         
   0x000000000000819f <isr_common+22>:  out    %al,$0xa0                         
   0x00000000000081a1 <isr_common+24>:  pop    %rax                              
   0x00000000000081a2 <isr_common.ack_master+0>:        push   %rax              
   0x00000000000081a3 <isr_common.ack_master+1>:        mov    $0x20,%al         
   0x00000000000081a5 <isr_common.ack_master+3>:        out    %al,$0x20         
   0x00000000000081a7 <isr_common.ack_master+5>:        pop    %rax              
   0x00000000000081a8 <isr_common.no_more_acks+0>:      cmp    $0x24,%ax         
   0x00000000000081ac <isr_common.no_more_acks+4>:      pop    %rax              
   0x00000000000081ad <isr_common.no_more_acks+5>:      pop    %rax              
=> 0x00000000000081ae <isr_common.end+0>:       iretq                            
   0x00000000000081b0 <isr_head_0+0>:   pushq  $0x55    ;DUMMY ERROR CODE                         
   0x00000000000081b2 <isr_head_0+2>:   mov    $0x0,%eax                         
   0x00000000000081b7 <isr_head_0+7>:   push   %rax                              
   0x00000000000081b8 <isr_head_0+8>:   jmp    0x8189 <isr_common>               
   0x00000000000081ba <isr_head_1+0>:   pushq  $0x55    ;DUMMY ERROR CODE                                                  
   0x00000000000081bc <isr_head_1+2>:   mov    $0x1,%eax                         
   0x00000000000081c1 <isr_head_1+7>:   push   %rax                              
   0x00000000000081c2 <isr_head_1+8>:   jmp    0x8189 <isr_common>

That shows a couple of "isr_head"s which are entered in the IDT, might push a dummy error code and jmp to isr_common.

(gdb) bt                                   
#0  0x00000000000081ae in isr_common.end () 
#1  0x0000000000008123 in LongMode.Nirv ()  
#2  0x0000000000000010 in ?? ()             
#3  0x0000000000000216 in ?? ()             
#4  0x0000000000015000 in Pd ()             
#5  0x0000000000000010 in ?? ()             
#6  0x000000b8e5894855 in ?? ()             
#7  0x78bf00000332e800 in ?? ()             
#8  0x000003e3e8000000 in ?? ()                        

where:

0x0000000000008122 <LongMode.Nirv+0>:        hlt                           
0x0000000000008123 <LongMode.Nirv+1>:        jmp    0x8122 <LongMode.Nirv> 

To be careful:

(gdb) info registers                              
rax            0x55     85                        
rbx            0x80000011       2147483665        
rcx            0xc0000080       3221225600        
rdx            0x3f8    1016                      
rsi            0xb      11                        
rdi            0x3fc    1020                      
rbp            0x0      0x0                       
rsp            0x14fd8  0x14fd8 <Pd+36824>        
r8             0x0      0                         
r9             0x0      0                         
r10            0x0      0                         
r11            0x0      0                         
r12            0x0      0                         
r13            0x0      0                         
r14            0x0      0                         
r15            0x0      0                         
rip            0x81ae   0x81ae <isr_common.end>   
eflags         0x97     [ CF PF AF SF ]           
cs             0x8      8                         
ss             0x10     16                        
ds             0x10     16                        
es             0x10     16                        
fs             0x10     16                        
gs             0x10     16

(gdb) x/32xg 0x14f00                                               
0x14f00 <Pd+36608>:     0x0000000000841f0f      0x000000841f0f2e66 
0x14f10 <Pd+36624>:     0x00841f0f2e660000      0x1f0f2e6600000000 
0x14f20 <Pd+36640>:     0x2e66000000000084      0x0000000000841f0f 
0x14f30 <Pd+36656>:     0x000000841f0f2e66      0x00841f0f2e660000 
0x14f40 <Pd+36672>:     0x1f0f2e6600000000      0x2e66000000000084 
0x14f50 <Pd+36688>:     0x0000000000841f0f      0x000000841f0f2e66 
0x14f60 <Pd+36704>:     0x00841f0f2e660000      0x1f0f2e6600000000 
0x14f70 <Pd+36720>:     0x2e66000000000084      0x0000000000841f0f 
0x14f80 <Pd+36736>:     0x000000841f0f2e66      0x00841f0f2e660000 
0x14f90 <Pd+36752>:     0x1f0f2e6600000000      0x2e66000000000084 
0x14fa0 <Pd+36768>:     0x0000000000000020      0x0000000000008144 
0x14fb0 <Pd+36784>:     0x0000000080000011      0x0000000000000020 
0x14fc0 <Pd+36800>:     0x0000000000000020      0x0000000000000020 
0x14fd0 <Pd+36816>:     0x0000000000000055      0x0000000000008123 
0x14fe0 <Pd+36832>:     0x0000000000000010      0x0000000000000216 
0x14ff0 <Pd+36848>:     0x0000000000015000      0x0000000000000010 

Now I'll let it run to the GP handler head:

(gdb) break isr_head_13                                  
Breakpoint 3 at 0x8236                                   
(gdb) c                                                  
Continuing.                                              

Breakpoint 3, 0x0000000000008236 in isr_head_13 ()       
(gdb) bt                                                 
#0  0x0000000000008236 in isr_head_13 ()                 
#1  0x0000000000000010 in ?? ()                          
#2  0x00000000000081ae in isr_common.no_more_acks ()     
#3  0x0000000000000008 in ?? ()                          
#4  0x0000000000000097 in ?? ()                          
#5  0x0000000000014fd8 in Pd ()                          
#6  0x0000000000000010 in ?? ()                          
#7  0x0000000000000055 in ?? ()                          
#8  0x0000000000008123 in LongMode.Nirv ()               
#9  0x0000000000000010 in ?? ()                          
#10 0x0000000000000216 in ?? ()                          
#11 0x0000000000015000 in Pd ()                          
#12 0x0000000000000010 in ?? ()

We see that it pushed the error code 0x10 after the usual stack with selector, flags and return address with selector, but the interesting thing is that my dummy error code from the timer (0x55) is back from the dead. We already know it was popped by the first iretq and I didn't push it this time:

(gdb) disas isr_head_13                                   
Dump of assembler code for function isr_head_13:          
=> 0x0000000000008236 <+0>:     mov    $0xd,%eax          
   0x000000000000823b <+5>:     push   %rax               
   0x000000000000823c <+6>:     jmpq   0x8189 <isr_common>

I guess that's just 16-byte alignment, but I'm not really involved in that. The stack was 16-byte aligned before the timer went off but the CPU pushed an odd number of longlongs.

So why would it crash? The Intel docs say that GP with a selector means it tried to pop something out of range, but I see no such problem.

Any help much appreciated.

assembly
operating-system
kernel
x86-64
interrupt-handling
asked on Stack Overflow Aug 12, 2018 by Adrian May • edited Aug 12, 2018 by Peter Cordes

1 Answer

3

The CS selector on the stack at the time of the first IRET is 10, which is a data segment, so that’s what is causing the #GP. Either the stack is modified in the handler (which doesn’t appear to be the case) or the CS register was not reloaded after changing the GDT.

The IRET from the GP handler returns to the previous IRET, which promptly faults again. You generally shouldn’t return from a fault handler unless you have resolved the fault.

It doesn’t look like it saves and restores all the registers in the handlers, which will cause problems once IRET starts working.

answered on Stack Overflow Aug 12, 2018 by prl

User contributions licensed under CC BY-SA 3.0