10.3 Delivering a Signal

We assume that the kernel noticed the arrival of a signal and invoked one of the functions mentioned in the previous section to prepare the process descriptor of the process that is supposed to receive the signal. But in case that process was not running on the CPU at that moment, the kernel deferred the task of delivering the signal. We now turn to the activities that the kernel performs to ensure that pending signals of a process are handled.

As mentioned in Section 4.8, the kernel checks the value of the sigpending flag of the process descriptor before allowing the process to resume its execution in User Mode. Thus, the kernel checks for the existence of pending signals every time it finishes handling an interrupt or an exception.

To handle the nonblocked pending signals, the kernel invokes the do_signal( ) function, which receives two parameters:

regs

The address of the stack area where the User Mode register contents of the current process are saved.

oldset

The address of a variable where the function is supposed to save the bit mask array of blocked signals. It is NULL if there is no need to save the bit mask array.

The do_signal( ) function starts by checking whether the function itself was triggered by an interrupt; if so, it simply returns. Otherwise, if the function was triggered by an exception that was raised while the process was running in User Mode, the function continues executing:

if ((regs->xcs & 3) != 3) 
    return 1; 

However, as we'll see in Section 10.3.4, this does not mean that a system call cannot be interrupted by a signal.

If the oldset parameter is NULL, the function initializes it with the address of the current->blocked field:

if (!oldset) 
    oldset = &current->blocked; 

The heart of the do_signal( ) function consists of a loop that repeatedly invokes the dequeue_signal( ) function until no nonblocked pending signals are left. The return code of dequeue_signal( ) is stored in the signr local variable. If its value is 0, it means that all pending signals have been handled and do_signal( ) can finish. As long as a nonzero value is returned, a pending signal is waiting to be handled. dequeue_signal( ) is invoked again after do_signal( ) handles the current signal.

The dequeue_signal( ) always considers the lowest-numbered pending signal. It updates the data structures to indicate that the signal is no longer pending and returns its number. This task involves clearing the corresponding bit in current->pending.signal and updating the value of current->sigpending. In the mask parameter, each bit that is set represents a blocked signal:

sig = 0; 
if (((x = current->pending.signal.sig[0]) & ~mask->sig[0]) != 0) 
    sig = 1 + ffz(~x); 
else if (((x = current->pending.signal.sig[1]) & ~mask->sig[1]) != 0) 
    sig = 33 + ffz(~x); 
if (sig) { 
    sigdelset(&current->signal, sig); 
    recalc_sigpending(current); 
} 
return sig; 

The collection of currently pending signals is ANDed with the blocked signals (the complement of mask). If anything is left, it represents a signal that should be delivered to the process. The ffz( ) function returns the index of the first bit in its parameter; this value is used to compute the lowest-number signal to be delivered.

Let's see how the do_signal( ) function handles any pending signal whose number is returned by dequeue_signal( ). First, it checks whether the current receiver process is being monitored by some other process; in this case, do_signal( ) invokes notify_parent( ) and schedule( ) to make the monitoring process aware of the signal handling.

Then do_signal( ) loads the ka local variable with the address of the k_sigaction data structure of the signal to be handled:

ka = &current->sig->action[signr-1]; 

Depending on the contents, three kinds of actions may be performed: ignoring the signal, executing a default action, or executing a signal handler.

10.3.1 Ignoring the Signal

When a delivered signal is explicitly ignored, the do_signal( ) function normally just continues with a new execution of the loop and therefore considers another pending signal. One exception exists, as described earlier:

if (ka->sa.sa_handler == SIG_IGN) { 
    if (signr == SIGCHLD) 
        while (sys_wait4(-1, NULL, WNOHANG, NULL) > 0) 
            /* nothing */; 
    continue; 
} 

If the signal delivered is SIGCHLD, the sys_wait4( ) service routine of the wait4( ) system call is invoked to force the process to read information about its children, thus cleaning up memory left over by the terminated child processes (see Section 3.5).

10.3.2 Executing the Default Action for the Signal

If ka->sa.sa_handler is equal to SIG_DFL, do_signal( ) must perform the default action of the signal. The only exception comes when the receiving process is init, in which case the signal is discarded as described in the earlier section Section 10.1.1:

if (current->pid == 1) 
    continue; 

For other processes, since the default action depends on the type of signal, the function executes a switch statement based on the value of signr.

The signals whose default action is "ignore" are easily handled:

case SIGCONT: case SIGCHLD: case SIGWINCH: 
    continue; 

The signals whose default action is "stop" may stop the current process. To do this, do_signal( ) sets the state of current to TASK_STOPPED and then invokes the schedule( ) function (see Section 11.2.2). The do_signal( ) function also sends a SIGCHLD signal to the parent process of current, unless the parent has set the SA_NOCLDSTOP flag of SIGCHLD:

case SIGTSTP: case SIGTTIN: case SIGTTOU: 
    if (is_orphaned_pgrp(current->pgrp)) 
        continue; 
case SIGSTOP: 
    current->state = TASK_STOPPED; 
    current->exit_code = signr; 
    if (current->p_pptr->sig && !(SA_NOCLDSTOP & 
        current->p_pptr->sig->action[SIGCHLD-1].sa.sa_flags)) 
        notify_parent(current, SIGCHLD); 
    schedule(  ); 
    continue; 

The difference between SIGSTOP and the other signals is subtle: SIGSTOP always stops the process, while the other signals stop the process only if it is not in an "orphaned process group." The POSIX standard specifies that a process group is not orphaned as long as there is a process in the group that has a parent in a different process group but in the same session.

The signals whose default action is "dump" may create a core file in the process working directory; this file lists the complete contents of the process's address space and CPU registers. After the do_signal( ) creates the core file, it kills the process. The default action of the remaining 18 signals is "terminate," which consists of just killing the process:

exit_code = sig_nr; 
case SIGQUIT: case SIGILL: case SIGTRAP: 
case SIGABRT: case SIGFPE: case SIGSEGV: 
case SIGBUS: case SIGSYS: case SIGXCPU: case SIGXFSZ:
    if (do_coredump(signr, regs)) 
        exit_code |= 0x80; 
    default: 
        sigaddset(&current->pending.signal, signr); 
        recalc_sigpending(current);
        current->flags |= PF_SIGNALED; 
        do_exit(exit_code); 

The do_exit( ) function receives as its input parameter the signal number ORed with a flag set when a core dump has been performed. That value is used to set the exit code of the process. The function terminates the current process, and hence never returns (see Chapter 20).

10.3.3 Catching the Signal

If a handler has been established for the signal, the do_signal( ) function must enforce its execution. It does this by invoking handle_signal( ):

handle_signal(signr, ka, &info, oldset, regs); 
return 1; 

Notice how do_signal( ) returns after having handled a single signal. Other pending signals won't be considered until the next invocation of do_signal( ). This approach ensures that real-time signals will be dealt with in the proper order.

Executing a signal handler is a rather complex task because of the need to juggle stacks carefully while switching between User Mode and Kernel Mode. We explain exactly what is entailed here.

Signal handlers are functions defined by User Mode processes and included in the User Mode code segment. The handle_signal( ) function runs in Kernel Mode while signal handlers run in User Mode; this means that the current process must first execute the signal handler in User Mode before being allowed to resume its "normal" execution. Moreover, when the kernel attempts to resume the normal execution of the process, the Kernel Mode stack no longer contains the hardware context of the interrupted program because the Kernel Mode stack is emptied at every transition from User Mode to Kernel Mode.

An additional complication is that signal handlers may invoke system calls. In this case, after the service routine executes, control must be returned to the signal handler instead of to the code of the interrupted program.

The solution adopted in Linux consists of copying the hardware context saved in the Kernel Mode stack onto the User Mode stack of the current process. The User Mode stack is also modified in such a way that, when the signal handler terminates, the sigreturn( ) system call is automatically invoked to copy the hardware context back on the Kernel Mode stack and restore the original content of the User Mode stack.

Figure 10-2 illustrates the flow of execution of the functions involved in catching a signal. A nonblocked signal is sent to a process. When an interrupt or exception occurs, the process switches into Kernel Mode. Right before returning to User Mode, the kernel executes the do_signal( ) function, which in turn handles the signal (by invoking handle_signal( )) and sets up the User Mode stack (by invoking setup_frame( ) or setup_rt_frame( )). When the process switches again to User Mode, it starts executing the signal handler because the handler's starting address was forced into the program counter. When that function terminates, the return code placed on the User Mode stack by the setup_frame( ) or setup_rt_frame( ) function is executed. This code invokes the sigreturn( ) system call, whose service routine copies the hardware context of the normal program in the Kernel Mode stack and restores the User Mode stack back to its original state (by invoking restore_sigcontext( )). When the system call terminates, the normal program can thus resume its execution.

Figure 10-2. Catching a signal

figs/ULK2_1002.gif

Let's now examine in detail how this scheme is carried out.

10.3.3.1 Setting up the frame

To properly set the User Mode stack of the process, the handle_signal( ) function invokes either setup_frame( ) (for signals that do not require a siginfo_t table; see Section 10.4 later in this chapter) or setup_rt_frame( ) (for signals that do require a siginfo_t table). To choose among these two functions, the kernel checks the value of the SA_SIGINFO flag in the sa_flags field of the sigaction table associated with the signal.

The setup_frame( ) function receives four parameters, which have the following meanings:

sig

Signal number

ka

Address of the k_sigaction table associated with the signal

oldset

Address of a bit mask array of blocked signals

regs

Address in the Kernel Mode stack area where the User Mode register contents are saved

The setup_frame( ) function pushes onto the User Mode stack a data structure called a frame, which contains the information needed to handle the signal and to ensure the correct return to the sys_sigreturn( ) function. A frame is a sigframe table that includes the following fields (see Figure 10-3):

pretcode

Return address of the signal handler function; it points to the retcode field (later in this list) in the same table.

sig

The signal number; this is the parameter required by the signal handler.

sc

Structure of type sigcontext containing the hardware context of the User Mode process right before switching to Kernel Mode (this information is copied from the Kernel Mode stack of current). It also contains a bit array that specifies the blocked regular signals of the process.

fpstate

Structure of type _fpstate that may be used to store the floating point registers of the User Mode process (see Section 3.3.4).

extramask

Bit array that specifies the blocked real-time signals.

retcode

Eight-byte code issuing a sigreturn( ) system call; this code is executed when returning from the signal handler.

Figure 10-3. Frame on the User Mode stack

figs/ULK2_1003.gif

The setup_frame( ) function starts by invoking get_sigframe( ) to compute the first memory location of the frame. That memory location is usually[4] in the User Mode stack, so the function returns the value:

[4] Linux allows processes to specify an alternate stack for their signal handlers by invoking the sigaltstack( ) system call; this feature is also requested by the X/Open standard. When an alternate stack is present, the get_sigframe( ) function returns an address inside that stack. We don't discuss this feature further, since it is conceptually similar to regular signal handling.

(regs->esp - sizeof(struct sigframe)) & 0xfffffff8

Since stacks grow toward lower addresses, the initial address of the frame is obtained by subtracting its size from the address of the current stack top and aligning the result to a multiple of 8.

The returned address is then verified by means of the access_ok macro; if it is valid, the function repeatedly invokes _ _put_user( ) to fill all the fields of the frame. Once this is done, it modifies the regs area of the Kernel Mode stack, thus ensuring that control is transferred to the signal handler when current resumes its execution in User Mode:

regs->esp = (unsigned long) frame; 
regs->eip = (unsigned long) ka->sa.sa_handler; 

The setup_frame( ) function terminates by resetting the segmentation registers saved on the Kernel Mode stack to their default value. Now the information needed by the signal handler is on the top of the User Mode stack.

The setup_rt_frame( ) function is very similar to setup_frame( ), but it puts on the User Mode stack an extended frame (stored in the rt_sigframe data structure) that also includes the content of the siginfo_t table associated with the signal.

10.3.3.2 Evaluating the signal flags

After setting up the User Mode stack, the handle_signal( ) function checks the values of the flags associated with the signal.

If the received signal has the SA_ONESHOT flag set, it must be reset to its default action so that further occurrences of the same signal will not trigger the execution of the signal handler:

if (ka->sa.sa_flags & SA_ONESHOT) 
    ka->sa.sa_handler = SIG_DFL; 

Moreover, if the signal does not have the SA_NODEFER flag set, the signals in the sa_mask field of the sigaction table must be blocked during the execution of the signal handler:

if (!(ka->sa.sa_flags & SA_NODEFER)) { 
    spin_lock_irq(&current->sigmask_lock);
    sigorsets(&current->blocked, &current->blocked, &ka->sa.sa_mask); 
    sigaddset(&current->blocked, sig); 
    recalc_sigpending(current); 
    spin_unlock_irq(&current->sigmask_lock);
} 

As described earlier, the recalc_sigpending( ) function checks whether the process has nonblocked pending signals and sets its sigpending field accordingly.

The function returns then to do_signal( ), which also returns immediately.

10.3.3.3 Starting the signal handler

When do_signal( ) returns, the current process resumes its execution in User Mode. Because of the preparation by setup_frame( ) described earlier, the eip register points to the first instruction of the signal handler, while esp points to the first memory location of the frame that has been pushed on top of the User Mode stack. As a result, the signal handler is executed.

10.3.3.4 Terminating the signal handler

When the signal handler terminates, the return address on top of the stack points to the code in the retcode field of the frame. For signals without siginfo_t table, the code is equivalent to the following assembly language instructions:

popl %eax 
movl $_ _NR_sigreturn, %eax 
int $0x80 

Therefore, the signal number (that is, the sig field of the frame) is discarded from the stack, and the sigreturn( ) system call is then invoked.

The sys_sigreturn( ) function computes the address of the pt_regs data structure regs, which contains the hardware context of the User Mode process (see Section 9.2.3). From the value stored in the esp field, it can thus derive and check the frame address inside the User Mode stack:

frame = (struct sigframe *)(regs.esp - 8); 
if (verify_area(VERIFY_READ, frame, sizeof(*frame)) {
    force_sig(SIGSEGV, current);
    return 0;
} 

Then the function copies the bit array of signals that were blocked before invoking the signal handler from the sc field of the frame to the blocked field of current. As a result, all signals that have been masked for the execution of the signal handler are unblocked. The recalc_sigpending( ) function is then invoked.

The sys_sigreturn( ) function must at this point copy the process hardware context from the sc field of the frame to the Kernel Mode stack and remove the frame from the User Mode stack; it performs these two tasks by invoking the restore_sigcontext( ) function.

If the signal was sent by a system call like rt_sigqueueinfo( ) that required a siginfo_t table to be associated to the signal, the mechanism is very similar. The return code in the retcode field of the extended frame invokes the rt_sigreturn( ) system call; the corresponding sys_rt_sigreturn( ) service routine copies the process hardware context from the extended frame to the Kernel Mode stack and restores the original User Mode stack content by removing the extended frame from it.

10.3.4 Reexecution of System Calls

The request associated with a system call cannot always be immediately satisfied by the kernel; when this happens, the process that issued the system call is put in a TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE state.

If the process is put in a TASK_INTERRUPTIBLE state and some other process sends a signal to it, the kernel puts it in the TASK_RUNNING state without completing the system call (see Section 4.8). When this happens, the system call service routine does not complete its job, but returns an EINTR, ERESTARTNOHAND, ERESTARTSYS, or ERESTARTNOINTR error code. The signal is delivered to the process while switching back to User Mode.

In practice, the only error code a User Mode process can get in this situation is EINTR, which means that the system call has not been completed. (The application programmer may check this code and decide whether to reissue the system call.) The remaining error codes are used internally by the kernel to specify whether the system call may be reexecuted automatically after the signal handler termination.

Table 10-6 lists the error codes related to unfinished system calls and their impact for each of the three possible signal actions. The terms that appear in the entries are defined in the following list:

Terminate

The system call will not be automatically reexecuted; the process will resume its execution in User Mode at the instruction following the int $0x80 one and the eax register will contain the -EINTR value.

Reexecute

The kernel forces the User Mode process to reload the eax register with the system call number and to reexecute the int $0x80 instruction; the process is not aware of the reexecution and the error code is not passed to it.

Depends

The system call is reexecuted only if the SA_RESTART flag of the delivered signal is set; otherwise, the system call terminates with a -EINTR error code.

Table 10-6. Reexecution of system calls

Signal Action

Error codes and their impact on system call execution

 

EINTR

ERESTARTSYS

ERESTARTNOHAND

ERESTARTNOINTR

Default

Terminate

Reexecute

Reexecute

Reexecute

Ignore

Terminate

Reexecute

Reexecute

Reexecute

Catch

Terminate

Depends

Terminate

Reexecute

When delivering a signal, the kernel must be sure that the process really issued a system call before attempting to reexecute it. This is where the orig_eax field of the regs hardware context plays a critical role. Let's recall how this field is initialized when the interrupt or exception handler starts:

Interrupt

The field contains the IRQ number associated with the interrupt minus 256 (see Section 4.6.1.4).

0x80 exception

The field contains the system call number (see Section 9.2.2).

Other exceptions

The field contains the value -1 (see Section 4.5.1).

Therefore, a non-negative value in the orig_eax field means that the signal has woken up a TASK_INTERRUPTIBLE process that was sleeping in a system call. The service routine recognizes that the system call was interrupted, and thus returns one of the previously mentioned error codes.

If the signal is explicitly ignored or if its default action is enforced, do_signal( ) analyzes the error code of the system call to decide whether the unfinished system call must be automatically reexecuted, as specified in Table 10-6. If the call must be restarted, the function modifies the regs hardware context so that, when the process is back in User Mode, eip points to the int $0x80 instruction and eax contains the system call number:

if (regs->orig_eax >= 0) { 
    if (regs->eax == -ERESTARTNOHAND || regs->eax == -ERESTARTSYS || 
          regs->eax == -ERESTARTNOINTR) { 
        regs->eax = regs->orig_eax; 
        regs->eip -= 2; 
    } 
} 

The regs->eax field is filled with the return code of a system call service routine (see Section 9.2.2).

If the signal is caught, handle_signal( ) analyzes the error code and, possibly, the SA_RESTART flag of the sigaction table to decide whether the unfinished system call must be reexecuted:

if (regs->orig_eax >= 0) { 
    switch (regs->eax) { 
        case -ERESTARTNOHAND: 
            regs->eax = -EINTR; 
            break; 
        case -ERESTARTSYS: 
            if (!(ka->sa.sa_flags & SA_RESTART)) { 
                regs->eax = -EINTR; 
                break; 
            } 
        /* fallthrough */ 
        case -ERESTARTNOINTR: 
            regs->eax = regs->orig_eax; 
            regs->eip -= 2; 
    } 
} 

If the system call must be restarted, handle_signal( ) proceeds exactly as do_signal( ); otherwise, it returns an -EINTR error code to the User Mode process.