4.8 Returning from Interrupts and Exceptions

We will finish the chapter by examining the termination phase of interrupt and exception handlers. Although the main objective is clear — namely, to resume execution of some program — several issues must be considered before doing it:

Number of kernel control paths being concurrently executed

If there is just one, the CPU must switch back to User Mode.

Pending process switch requests

If there is any request, the kernel must perform process scheduling; otherwise, control is returned to the current process.

Pending signals

If a signal is sent to the current process, it must be handled.

The kernel assembly language code that accomplishes all these things is not, technically speaking, a function, since control is never returned to the functions that invoke it. It is a piece of code with four different entry points called ret_from_intr, ret_from_exception, ret_from_sys_call, and ret_from_fork. We will refer to it as four different functions since this makes the description simpler, and we shall refer quite often to the following three entry points as functions:

ret_from_exception( )

Terminates all exceptions except the 0x80 ones

ret_from_intr( )

Terminates interrupt handlers

ret_from_sys_call( )

Terminates system calls (i.e., kernel control paths engendered by 0x80 programmed exceptions)

ret_from_fork( )

Terminates the fork( ), vfork( ), or clone( ) system calls (child only).

The general flow diagram with the corresponding four entry points is illustrated in Figure 4-5. The ret_from_exception( ) and ret_from_intr( ) entry points look the same in the picture, but they aren't. In the former case, the kernel knows the descriptor of the process that caused the exception; in the latter case, no process descriptor is associated with the interrupt. Besides the labels corresponding to the entry points, a few others have been added to allow you to relate the assembly language code more easily to the flow diagram. Let's now examine in detail how the termination occurs in each case.

Figure 4-5. Returning from interrupts and exceptions

figs/ULK2_0405.gif

4.8.1 The ret_ from_exception( ) Function

The ret_from_exception( ) function is essentially equivalent to the following assembly language code:

ret_from_exception:

    movl 0x30(%esp),%eax

    movb 0x2C(%esp),%al

    testl $(0x000200003),%eax

    jne ret_from_sys_call

restore_all:

    popl %ebx

    popl %ecx

    popl %edx

    popl %esi

    popl %edi

    popl %ebp

    popl %eax

    popl %ds

    popl %es

    addl $4,%esp

    iret

The values of the cs and eflags registers, which were pushed on the stack when the exception occurred, are used by the function to determine whether the interrupted program was running in User Mode or if the VM flag of eflags was set.^[15] In either case, a jump is made to the ret_from_sys_call( ) function. Otherwise, the interrupted kernel control path is to be restarted. The function loads the registers with the values saved by the SAVE_ALL macro when the exception started, and the function yields control to the interrupted program by executing the iret instruction.

^[15] This flag allows programs to be executed in Virtual-8086 Mode; see the Pentium manuals for more details.

4.8.2 The ret_ from_intr( ) Function

The ret_from_intr( ) function is essentially equivalent to ret_from_exception( ):

ret_from_intr:

    movl $0xffffe000,%ebx

    andl %esp,%ebx

    jmp ret_from_exception

Before invoking ret_from_exception( ), ret_from_intr( ) loads in the ebx register the address of the current's process descriptor (see Section 3.2.2). This is necessary because the ret_from_sys_call( ) function, which can be invoked by ret_from_exception( ), expects to find that address in ebx. On the other hand, when ret_from_exception( ) starts, the ebx register has already been loaded with current's address by the exception handler (see Section 4.5.1 earlier in this chapter).

4.8.3 The ret_ from_sys_call( ) Function

The ret_from_sys_call( ) function is equivalent to the following assembly language code:

ret_from_sys_call:

cli

    cmpl $0,20(%ebx)

    jne reschedule

    cmpl $0,8(%ebx)

    jne signal_return

    jmp restore_all

As we said previously, the ebx register points to the current process descriptor; within that descriptor, the need_resched field is at offset 20, which is checked by the first cmpl instruction. Therefore, if the need_resched field is 1, the schedule( ) function is invoked to perform a process switch:

reschedule:

    call schedule

    jmp ret_from_sys_call

The offset of the sigpending field inside the process descriptor is 8. If it is null, current resumes execution in User Mode by restoring the hardware context of the process saved on the stack. Otherwise, the function jumps to signal_return to process the pending signals of current:

signal_return:

sti

    testl $(0x00020000),0x30(%esp)

    movl %esp,%eax

    jne v86_signal_return

    xorl %edx,%edx

    call do_signal

    jmp restore_all

v86_signal_return:

    call save_v86_state

    movl %eax,%esp

    xorl %edx,%edx

    call do_signal

    jmp restore_all

If the interrupted process was in VM86 mode, the save_v86_state( ) function is invoked. The do_signal( ) function (see Chapter 10) is then invoked to handle the pending signals. Finally, current can resume execution in User Mode.

4.8.4 The ret_ from_ fork( ) Function

The ret_from_fork( ) function is executed by the child process right after its creation through a fork( ), vfork( ), or clone( ) system call (see Section 3.4.1). It is essentially equivalent to the following assembly language code:

ret_from_fork:

    pushl %ebx

    call schedule_tail

    addl $4,%esp

    movl $0xffffe000,%ebx

    andl %esp,%ebx

    testb $0x02,24(%ebx)

    jne tracesys_exit

    jmp ret_from_sys_call

tracesys_exit:

    call syscall_trace

    jmp ret_from_sys_call

Initially, the ebx register stores the address of the parent's process descriptor; this value is passed to the schedule_tail( ) function as a parameter (see Chapter 11). When that function returns, ebx is reloaded with the current's process descriptor address. Then the ret_from_fork( ) function checks the value of the ptrace field of the current (at offset 24 of the process descriptor). If the field is not null, the fork( ), vfork( ), or clone( ) system call is traced, so the syscall_trace( ) function is invoked to notify the debugging process. We give more details on system call tracing in Chapter 9.