A timer is a software facility that allows functions to be invoked at some future moment, after a given time interval has elapsed; a time-out denotes a moment at which the time interval associated with a timer has elapsed.
Timers are widely used both by the kernel and by processes. Most device drivers use timers to detect anomalous conditions — floppy disk drivers, for instance, use timers to switch off the device motor after the floppy has not been accessed for a while, and parallel printer drivers use them to detect erroneous printer conditions.
Timers are also used quite often by programmers to force the execution of specific functions at some future time (see the later section Section 6.7.3).
Implementing a timer is relatively easy. Each timer contains a field that indicates how far in the future the timer should expire. This field is initially calculated by adding the right number of ticks to the current value of jiffies. The field does not change. Every time the kernel checks a timer, it compares the expiration field to the value of jiffies at the current moment, and the timer expires when jiffies is greater or equal to the stored value. This comparison is made via the time_after, time_before, time_after_eq, and time_before_eq macros, which take care of possible overflows of jiffies.
Linux considers two types of timers called dynamic timers and interval timers. The first type is used by the kernel, while interval timers may be created by processes in User Mode.[7]
[7] Earlier versions of Linux use a third type of kernel timers: the so-called static timers. Static timers are very rudimental because they cannot be dynamically allocated or destroyed, and at most there could be 32 of them. Static timers were replaced by dynamic timers, and new kernels (starting from Version 2.4) no longer support them.
One word of caution about Linux timers: since checking for timer functions is always done by bottom halves that may be executed a long time after they have been activated, the kernel cannot ensure that timer functions will start right at their expiration times. It can only ensure that they are executed either at the proper time or after with a delay of up to a few hundreds of milliseconds. For this reason, timers are not appropriate for real-time applications in which expiration times must be strictly enforced.
Dynamic timers may be dynamically created and destroyed. No limit is placed on the number of currently active dynamic timers.
A dynamic timer is stored in the following timer_list structure:
struct timer_list {
struct list_head list;
unsigned long expires;
unsigned long data;
void (*function)(unsigned long);
};
The function field contains the address of the function to be executed when the timer expires. The data field specifies a parameter to be passed to this timer function. Thanks to the data field, it is possible to define a single general-purpose function that handles the time-outs of several device drivers; the data field could store the device ID or other meaningful data that could be used by the function to differentiate the device.
The expires field specifies when the timer expires; the time is expressed as the number of ticks that have elapsed since the system started up. All timers that have an expires value smaller than or equal to the value of jiffies are considered to be expired or decayed.
The list field includes the links for a doubly linked circular list. There are 512 doubly linked circular lists to hold dynamic timers. Each timer is inserted into one of the lists based on the value of the expires field. The algorithm that uses this list is described later in this chapter.
To create and activate a dynamic timer, the kernel must:
1. Create a new timer_list object — for example, t. This can be done in several ways by:
o Defining a static global variable in the code.
o Defining a local variable inside a function; in this case, the object is stored on the Kernel Mode stack.
o Including the object in a dynamically allocated descriptor.
2. Initialize the object by invoking the init_timer(&t) function. This simply sets the links in the list field to NULL.
3. Load the function field with the address of the function to be activated when the timer decays. If required, load the data field with a parameter value to be passed to the function.
4. If the dynamic timer is not already inserted in a list, assign a proper value to the expires field. Otherwise, update the expires field by invoking the mod_timer( ) function, which also takes care of moving the object into the proper list (discussed shortly).
5. If the dynamic timer is not already inserted in a list, insert the t element in the proper list by invoking the add_timer(&t) function.
Once the timer has decayed, the kernel automatically removes the t element from its list. Sometimes, however, a process should explicitly remove a timer from its list using the del_timer( ) or del_timer_sync( ) functions. Indeed, a sleeping process may be woken up before the time-out is over; in this case, the process may choose to destroy the timer. Invoking del_timer( ) or del_timer_sync( ) on a timer already removed from a list does no harm, so removing the timer within the timer function is considered a good practice.
Being asynchronously activated, dynamic timers are prone to race conditions. For instance, consider a dynamic timer whose function acts on a discardable resource (e.g., a kernel module or a file data structure). Releasing the resource without stopping the timer may lead to data corruption if the timer function got activated when the resource no longer exists. Thus, a rule of thumb is to stop the timer before releasing the resource:
...
del_timer(&t);
X_Release_Resources( );
...
In multiprocessor systems, however, this code is not safe because the timer function might already be running on another CPU when del_timer( ) is invoked. As a result, resources may be released while the timer function is still acting on them. To avoid this kind of race condition, the kernel offers the del_timer_sync( ) function. It removes the timer from the list, and then it checks whether the timer function is executed on another CPU; in such a case, del_timer_sync( ) waits until the timer function terminates.
Other types of race conditions exist, of course. For instance, the right way to modify the expires field of an already activated timer consists of using mod_timer( ), rather than deleting the timer and recreating it thereafter. In the latter approach, two kernel control paths that want to modify the expires field of the same timer may mix each other up badly. The implementation of the timer functions is made SMP-safe by means of the timerlist_lock spin lock: every time the kernel must access the lists of dynamic timers, it disables the interrupts and acquires this spin lock.
Choosing the proper data structure to implement dynamic timers is not easy. Stringing together all timers in a single list would degrade system performances, since scanning a long list of timers at every tick is costly. On the other hand, maintaining a sorted list would not be much more efficient, since the insertion and deletion operations would also be costly.
The adopted solution is based on a clever data structure that partitions the expires values into blocks of ticks and allows dynamic timers to percolate efficiently from lists with larger expires values to lists with smaller ones.
The main data structure is an array called tvecs, whose elements point to five groups of lists identified by the tv1, tv2, tv3, tv4, and tv5 structures (see Figure 6-2).
The tv1 structure is of type struct timer_vec_root, which includes an index field and a vec array of 256 list_head elements — that is, lists of dynamic timers. It contains all dynamic timers that will decay within the next 255 ticks.
The index field specifies the currently scanned list; it is initialized to 0 and incremented by 1 (modulo 256) at every tick. The list referenced by index contains all dynamic timers that expired during the current tick; the next list contains all dynamic timers that will expire in the next tick; the (index+k)th list contains all dynamic timers that will expire in exactly k ticks. When index returns to 0, it signifies that all the timers in tv1 have been scanned. In this case, the list pointed to by tv2.vec[tv2.index] is used to replenish tv1.
The tv2, tv3, and tv4 structures of type struct timer_vec contain all dynamic timers that will decay within the next 214-1, 220-1, and 226-1 ticks, respectively.
The tv5 structure is identical to the previous ones, except that the last entry of the vec array includes dynamic timers with extremely large expires fields. It never needs to be replenished from another array.
The timer_vec structure is very similar to timer_vec_root: it contains an index field and a vec array of 64 pointers to dynamic timer lists. The index field specifies the currently scanned list; it is incremented by 1 (modulo 64) every 256i-1 ticks, where i, ranging between 2 and 5, is the tvi group number. As in the case of tv1, when index returns to 0, the list pointed to by tvj.vec[tvj.index] is used to replenish tvi (i ranges between 2 and 4, j is equal to i+1).
Thus, the first element of tv2 holds a list of all timers that expire in the 256 ticks following the tv1 timers; the timers in this list are sufficient to replenish the whole array tv1. The second element of tv2 holds all timers that expire in the following 256 ticks, and so on. Similarly, a single entry of tv3 is sufficient to replenish the whole array tv2.
Figure 6-2 shows how these data structures are connected.
The timer_bh( ) function associated with the TIMER_BH bottom half invokes the run_timer_list( ) auxiliary function to check for decayed dynamic timers. The function relies on a variable similar to jiffies that is called timer_jiffies. This new variable is needed because a few timer interrupts might occur before the activated TIMER_BH bottom half has a chance to run; this happens typically when several interrupts of different types are issued in a short interval of time.
The value of timer_jiffies represents the expiration time of the dynamic timer list yet to be checked: if it coincides with the value of jiffies, no backlog of bottom half functions has accumulated; if it is smaller than jiffies, then bottom half functions that refer to previous ticks must be dealt with. The variable is set to 0 at system startup and is incremented only by run_timer_list( ). Its value can never be greater than jiffies.
The run_timer_list( ) function is essentially equivalent the following C fragment:
struct list_head *head, *curr;
struct timer_list *timer;
void (*fn)(unsigned long);
unsigned long data;
spin_lock_irq(&timerlist_lock);
while ((long)(jiffies - timer_jiffies) >= 0) {
if (!tv1.index) {
int n = 1;
do {
cascade_timers(tvecs[n]);
} while (tvecs[n]->index == 1 && ++n < 5));
}
head = &tv1.vec[tv1.index];
for (curr = head->next; curr != head; curr = head->next) {
timer = list_entry(curr, struct timer_list, list);
fn = timer->function;
data= timer->data;
detach_timer(timer);
timer->list.next = timer->list.prev = NULL;
running_timer = timer;
spin_unlock_irq(&timerlist_lock);
fn(data);
spin_lock_irq(&timerlist_lock);
running_timer = NULL;
}
++timer_jiffies;
tv1.index = (tv1.index + 1) & 0xff;
}
spin_unlock_irq(&timerlist_lock);
The outermost while loop ends when timer_jiffies becomes greater than the value of jiffies. Since the values of jiffies and timer_jiffies usually coincide, the outermost while cycle is often executed only once. In general, the outermost loop is executed jiffies - timer_jiffies + 1 consecutive times. Moreover, if a timer interrupt occurs while run_timer_list( ) is being executed, dynamic timers that decay at this tick occurrence are also considered, since the jiffies variable is asynchronously incremented by the PIT's interrupt handler (see the earlier section Section 6.2.1.1).
During a single execution of the outermost while cycle, the dynamic timer functions included in the tv1.vec[tv1.index] list are executed. Before executing a dynamic timer function, the loop invokes the detach_timer( ) function to remove the dynamic timer from the list. Once the list is emptied, the value of tv1.index is incremented (modulo 256) and the value of timer_jiffies is incremented.
When tv1.index becomes equal to 0, all the lists of tv1 have been checked; in this case, it is necessary to refill the tv1 structure. This is accomplished by the cascade_timers( ) function, which transfers the dynamic timers included in tv2.vec[tv2.index] into tv1.vec, since they will necessarily decay within the next 256 ticks. If tv2.index is equal to 0, the tv2 array of lists must be refilled with the elements of tv3.vec[tv3.index].
Notice that run_timer_list( ) disables interrupts and acquires the timerlist_lock spin lock just before entering the outermost loop; interrupts are enabled and the spin lock is released right before invoking each dynamic timer function, until its termination. This ensures that the dynamic timer data structures are not corrupted by interleaved kernel control paths.
To sum up, this rather complex algorithm ensures excellent performance. To see why, assume for the sake of simplicity that the TIMER_BH bottom half is executed right after the corresponding timer interrupt occurs. Then, in 255 timer interrupt occurrences out of 256 (in 99.6% of the cases), the run_timer_list( ) function just runs the functions of the decayed timers, if any. To replenish tv1.vec periodically, it is sufficient 63 times out of 64 to partition the list pointed to by tv2.vec[tv2.index] into the 256 lists pointed to by tv1.vec. The tv2.vec array, in turn, must be replenished in 0.02 percent of the cases (that is, once every 163 seconds). Similarly, tv3 is replenished every 2 hours and 54 minutes, and tv4 is replenished every 7 days and 18 hours. tv5 doesn't need to be replenished.
To show how the outcome of all the previous activities are actually used in the kernel, we'll show an example of the creation and use of a process time-out.
Let's assume that the kernel decides to suspend the current process for two seconds. It does this by executing the following code:
timeou = 2 * HZ;
set_current_state(TASK_INTERRUPTIBLE); /* or TASK_UNINTERRUPTIBLE */
remaining = schedule_timeout(timeout);
The kernel implements process time-outs by using dynamic timers. They appear in the schedule_timeout( ) function, which essentially executes the following statements:
struct timer_list timer;
expire = timeout + jiffies;
init_timer(&timer);
timer.expires = expire;
timer.data = (unsigned long) current;
timer.function = process_timeout;
add_timer(&timer);
schedule( ); /* process suspended until timer expires */
del_timer_sync(&timer);
timeout = expire - jiffies;
return (timeout < 0 ? 0 : timeout);
When schedule( ) is invoked, another process is selected for execution; when the former process resumes its execution, the function removes the dynamic timer. In the last statement, the function either returns 0, if the time-out is expired, or it returns the number of ticks left to the time-out expiration if the process was awoken for some other reason.
When the time-out expires, the kernel executes the following function:
void process_timeout(unsigned long data)
{
struct task_struct * p = (struct task_struct *) data;
wake_up_process(p);
}
The run_timer_list( ) function invokes process_timeout( ), passing as its parameter the process descriptor pointer stored in the data field of the timer object. As a result, the suspended process is woken up.