4.5 Exception Handling

Most exceptions issued by the CPU are interpreted by Linux as error conditions. When one of them occurs, the kernel sends a signal to the process that caused the exception to notify it of an anomalous condition. If, for instance, a process performs a division by zero, the CPU raises a "Divide error" exception and the corresponding exception handler sends a SIGFPE signal to the current process, which then takes the necessary steps to recover or (if no signal handler is set for that signal) abort.

There are a couple of cases, however, where Linux exploits CPU exceptions to manage hardware resources more efficiently. A first case is already described in section Section 3.3.4. The "Device not available" exception is used together with the TS flag of the cr0 register to force the kernel to load the floating point registers of the CPU with new values. A second case refers to the Page Fault exception, which is used to defer allocating new page frames to the process until the last possible moment. The corresponding handler is complex because the exception may, or may not, denote an error condition (see Section 8.4).

Exception handlers have a standard structure consisting of three parts:

1.       Save the contents of most registers in the Kernel Mode stack (this part is coded in assembly language).

2.       Handle the exception by means of a high-level C function.

3.       Exit from the handler by means of the ret_from_exception( ) function.

To take advantage of exceptions, the IDT must be properly initialized with an exception handler function for each recognized exception. It is the job of the trap_init( ) function to insert the final values—the functions that handle the exceptions—into all IDT entries that refer to nonmaskable interrupts and exceptions. This is accomplished through the set_trap_gate, set_intr_gate, and set_system_gate macros:

set_trap_gate(0,&divide_error); 
set_trap_gate(1,&debug); 
set_intr_gate(2,&nmi); 
set_system_gate(3,&int3); 
set_system_gate(4,&overflow); 
set_system_gate(5,&bounds); 
set_trap_gate(6,&invalid_op); 
set_trap_gate(7,&device_not_available); 
set_trap_gate(8,&double_fault); 
set_trap_gate(9,&coprocessor_segment_overrun); 
set_trap_gate(10,&invalid_TSS); 
set_trap_gate(11,&segment_not_present); 
set_trap_gate(12,&stack_segment); 
set_trap_gate(13,&general_protection); 
set_intr_gate(14,&page_fault); 
set_trap_gate(16,&coprocessor_error); 
set_trap_gate(17,&alignment_check); 
set_trap_gate(18,&machine_check); 
set_trap_gate(19,&simd_coprocessor_error); 
set_system_gate(128,&system_call); 

Now we will look at what a typical exception handler does once it is invoked.

4.5.1 Saving the Registers for the Exception Handler

Let's use handler_name to denote the name of a generic exception handler. (The actual names of all the exception handlers appear on the list of macros in the previous section.) Each exception handler starts with the following assembly language instructions:

handler_name:
    pushl $0 /* only for some exceptions */
    pushl $do_handler_name
    jmp error_code

If the control unit is not supposed to automatically insert a hardware error code on the stack when the exception occurs, the corresponding assembly language fragment includes a pushl $0 instruction to pad the stack with a null value. Then the address of the high-level C function is pushed on the stack; its name consists of the exception handler name prefixed by do_.

The assembly language fragment labeled as error_code is the same for all exception handlers except the one for the "Device not available" exception (see Section 3.3.4). The code performs the following steps:

1.       Saves the registers that might be used by the high-level C function on the stack.

2.       Issues a cld instruction to clear the direction flag DF of eflags, thus making sure that autoincrements on the edi and esi registers will be used with string instructions.[5]

[5] A single assembly language "string instruction," such as rep;movsb, is able to act on a whole block of data (string).

3.       Copies the hardware error code saved in the stack at location esp+36 in eax. Stores the value -1 in the same stack location. As we shall see in Section 10.3.4, this value is used to separate 0x80 exceptions from other exceptions.

4.       Loads edi with the address of the high-level do_handler_name( ) C function saved in the stack at location esp+32; writes the contents of es in that stack location.

5.       Loads the kernel data Segment Selector into the ds and es registers, then sets the ebx register to the address of the current process descriptor (see Section 3.2.2).

6.       Stores the parameters to be passed to the high-level C function on the stack, namely, the exception hardware error code and the address of the stack location where the contents of User Mode registers is saved.

7.       Invokes the high-level C function whose address is now stored in edi.

After the last step is executed, the invoked function finds the following on the top locations of the stack:

·         The return address of the instruction to be executed after the C function terminates (see the next section)

·         The stack address of the saved User Mode registers

·         The hardware error code

4.5.2 Entering and Leaving the Exception Handler

As already explained, the names of the C functions that implement exception handlers always consist of the prefix do_ followed by the handler name. Most of these functions store the hardware error code and the exception vector in the process descriptor of current, and then send a suitable signal to that process. This is done as follows:

current->tss.error_code = error_code; 
current->tss.trap_no = vector; 
force_sig(sig_number, current); 

The current process takes care of the signal right after the termination of exception handler. The signal will be handled either in User Mode by the process's own signal handler (if it exists) or in Kernel Mode. In the latter case, the kernel usually kills the process (see Chapter 10). The signals sent by the exception handlers are already in Table 4-1.

The exception handler always checks whether the exception occurred in User Mode or in Kernel Mode and, in the latter case, whether it was due to an invalid argument of a system call. We'll describe in Section 9.2.6 how the kernel defends itself against invalid arguments of system calls. Any other exception raised in Kernel Mode is due to a kernel bug. In this case, the exception handler knows the kernel is misbehaving and, in order to avoid data corruption on the hard disks, the handler invokes the die( ) function, which prints the contents of all CPU registers on the console (this dump is called kernel oops) and terminates the current process by calling do_exit( ) (see Chapter 20).

When the C function that implements the exception handling terminates, control is transferred to the following assembly language fragment:

addl $8, %esp 
jmp ret_from_exception 

The code pops the stack address of the saved User Mode registers and the hardware error code from the stack, and then performs a jmp instruction to the ret_from_exception( ) function. This function is described in the later section Section 4.8.