We will finish the chapter by examining the termination phase of interrupt and exception handlers. Although the main objective is clear — namely, to resume execution of some program — several issues must be considered before doing it:
Number of kernel control paths being concurrently executed
If there is just one, the CPU must switch back to User Mode.
Pending process switch requests
If there is any request, the kernel must perform process scheduling; otherwise, control is returned to the current process.
Pending signals
If a signal is sent to the current process, it must be handled.
The kernel assembly language code that accomplishes all these things is not, technically speaking, a function, since control is never returned to the functions that invoke it. It is a piece of code with four different entry points called ret_from_intr, ret_from_exception, ret_from_sys_call, and ret_from_fork. We will refer to it as four different functions since this makes the description simpler, and we shall refer quite often to the following three entry points as functions:
ret_from_exception( )
Terminates all exceptions except the 0x80 ones
ret_from_intr( )
Terminates interrupt handlers
ret_from_sys_call( )
Terminates system calls (i.e., kernel control paths engendered by 0x80 programmed exceptions)
ret_from_fork( )
Terminates the fork( ), vfork( ), or clone( ) system calls (child only).
The general flow diagram with the corresponding four entry points is illustrated in Figure 4-5. The ret_from_exception( ) and ret_from_intr( ) entry points look the same in the picture, but they aren't. In the former case, the kernel knows the descriptor of the process that caused the exception; in the latter case, no process descriptor is associated with the interrupt. Besides the labels corresponding to the entry points, a few others have been added to allow you to relate the assembly language code more easily to the flow diagram. Let's now examine in detail how the termination occurs in each case.
The ret_from_exception( ) function is essentially equivalent to the following assembly language code:
ret_from_exception:
movl 0x30(%esp),%eax
movb 0x2C(%esp),%al
testl $(0x000200003),%eax
jne ret_from_sys_call
restore_all:
popl %ebx
popl %ecx
popl %edx
popl %esi
popl %edi
popl %ebp
popl %eax
popl %ds
popl %es
addl $4,%esp
iret
The values of the cs and eflags registers, which were pushed on the stack when the exception occurred, are used by the function to determine whether the interrupted program was running in User Mode or if the VM flag of eflags was set.[15] In either case, a jump is made to the ret_from_sys_call( ) function. Otherwise, the interrupted kernel control path is to be restarted. The function loads the registers with the values saved by the SAVE_ALL macro when the exception started, and the function yields control to the interrupted program by executing the iret instruction.
[15] This flag allows programs to be executed in Virtual-8086 Mode; see the Pentium manuals for more details.
The ret_from_intr( ) function is essentially equivalent to ret_from_exception( ):
ret_from_intr:
movl $0xffffe000,%ebx
andl %esp,%ebx
jmp ret_from_exception
Before invoking ret_from_exception( ), ret_from_intr( ) loads in the ebx register the address of the current's process descriptor (see Section 3.2.2). This is necessary because the ret_from_sys_call( ) function, which can be invoked by ret_from_exception( ), expects to find that address in ebx. On the other hand, when ret_from_exception( ) starts, the ebx register has already been loaded with current's address by the exception handler (see Section 4.5.1 earlier in this chapter).
The ret_from_sys_call( ) function is equivalent to the following assembly language code:
ret_from_sys_call:
cli
cmpl $0,20(%ebx)
jne reschedule
cmpl $0,8(%ebx)
jne signal_return
jmp restore_all
As we said previously, the ebx register points to the current process descriptor; within that descriptor, the need_resched field is at offset 20, which is checked by the first cmpl instruction. Therefore, if the need_resched field is 1, the schedule( ) function is invoked to perform a process switch:
reschedule:
call schedule
jmp ret_from_sys_call
The offset of the sigpending field inside the process descriptor is 8. If it is null, current resumes execution in User Mode by restoring the hardware context of the process saved on the stack. Otherwise, the function jumps to signal_return to process the pending signals of current:
signal_return:
sti
testl $(0x00020000),0x30(%esp)
movl %esp,%eax
jne v86_signal_return
xorl %edx,%edx
call do_signal
jmp restore_all
v86_signal_return:
call save_v86_state
movl %eax,%esp
xorl %edx,%edx
call do_signal
jmp restore_all
If the interrupted process was in VM86 mode, the save_v86_state( ) function is invoked. The do_signal( ) function (see Chapter 10) is then invoked to handle the pending signals. Finally, current can resume execution in User Mode.
The ret_from_fork( ) function is executed by the child process right after its creation through a fork( ), vfork( ), or clone( ) system call (see Section 3.4.1). It is essentially equivalent to the following assembly language code:
ret_from_fork:
pushl %ebx
call schedule_tail
addl $4,%esp
movl $0xffffe000,%ebx
andl %esp,%ebx
testb $0x02,24(%ebx)
jne tracesys_exit
jmp ret_from_sys_call
tracesys_exit:
call syscall_trace
jmp ret_from_sys_call
Initially, the ebx register stores the address of the parent's process descriptor; this value is passed to the schedule_tail( ) function as a parameter (see Chapter 11). When that function returns, ebx is reloaded with the current's process descriptor address. Then the ret_from_fork( ) function checks the value of the ptrace field of the current (at offset 24 of the process descriptor). If the field is not null, the fork( ), vfork( ), or clone( ) system call is traced, so the syscall_trace( ) function is invoked to notify the debugging process. We give more details on system call tracing in Chapter 9.