2.3 Segmentation in Linux

Segmentation has been included in 80 x 86 microprocessors to encourage programmers to split their applications into logically related entities, such as subroutines or global and local data areas. However, Linux uses segmentation in a very limited way. In fact, segmentation and paging are somewhat redundant since both can be used to separate the physical address spaces of processes: segmentation can assign a different linear address space to each process, while paging can map the same linear address space into different physical address spaces. Linux prefers paging to segmentation for the following reasons:

·         Memory management is simpler when all processes use the same segment register values — that is, when they share the same set of linear addresses.

·         One of the design objectives of Linux is portability to a wide range of architectures; RISC architectures in particular have limited support for segmentation.

The 2.4 version of Linux uses segmentation only when required by the 80 x 86 architecture. In particular, all processes use the same logical addresses, so the total number of segments to be defined is quite limited, and it is possible to store all Segment Descriptors in the Global Descriptor Table (GDT). This table is implemented by the array gdt_table referred to by the gdt variable.

Local Descriptor Tables are not used by the kernel, although a system call called modify_ldt( ) exists that allows processes to create their own LDTs. This turns out to be useful to applications (such as Wine) that execute segment-oriented Microsoft Windows applications.

Here are the segments used by Linux:

·         A kernel code segment. The fields of the corresponding Segment Descriptor in the GDT have the following values:

o        Base = 0x00000000

o        Limit = 0xfffff

o        G (granularity flag) = 1, for segment size expressed in pages

o        S (system flag) = 1, for normal code or data segment

o        Type = 0xa, for code segment that can be read and executed

o        DPL (Descriptor Privilege Level) = 0, for Kernel Mode

o        D/B (32-bit address flag) = 1, for 32-bit offset addresses

Thus, the linear addresses associated with that segment start at 0 and reach the addressing limit of 232 -1. The S and Type fields specify that the segment is a code segment that can be read and executed. Its DPL value is 0, so it can be accessed only in Kernel Mode. The corresponding Segment Selector is defined by the _ _KERNEL_CS macro. To address the segment, the kernel just loads the value yielded by the macro into the cs register.

·         A kernel data segment. The fields of the corresponding Segment Descriptor in the GDT have the following values:

o        Base = 0x00000000

o        Limit = 0xfffff

o        G (granularity flag) = 1, for segment size expressed in pages

o        S (system flag) = 1, for normal code or data segment

o        Type = 2, for data segment that can be read and written

o        DPL (Descriptor Privilege Level) = 0, for Kernel Mode

o        D/B (32-bit address flag) = 1, for 32-bit offset addresses

This segment is identical to the previous one (in fact, they overlap in the linear address space), except for the value of the Type field, which specifies that it is a data segment that can be read and written. The corresponding Segment Selector is defined by the _ _KERNEL_DS macro.

·         A user code segment shared by all processes in User Mode. The fields of the corresponding Segment Descriptor in the GDT have the following values:

o        Base = 0x00000000

o        Limit = 0xfffff

o        G (granularity flag) = 1, for segment size expressed in pages

o        S (system flag) = 1, for normal code or data segment

o        Type = 0xa, for code segment that can be read and executed

o        DPL (Descriptor Privilege Level) = 3, for User Mode

o        D/B (32-bit address flag) = 1, for 32-bit offset addresses

The S and DPL fields specify that the segment is not a system segment and its privilege level is equal to 3; it can thus be accessed both in Kernel Mode and in User Mode. The corresponding Segment Selector is defined by the _ _USER_CS macro.

·         A user data segment shared by all processes in User Mode. The fields of the corresponding Segment Descriptor in the GDT have the following values:

o        Base = 0x00000000

o        Limit = 0xfffff

o        G (granularity flag) = 1, for segment size expressed in pages

o        S (system flag) = 1, for normal code or data segment

o        Type = 2, for data segment that can be read and written

o        DPL (Descriptor Privilege Level) = 3, for User Mode

o        D/B (32-bit address flag) = 1, for 32-bit offset addresses

This segment overlaps the previous one: they are identical, except for the value of Type. The corresponding Segment Selector is defined by the _ _USER_DS macro.

·         A Task State Segment (TSS) for each processor. The linear address space corresponding to each TSS is a small subset of the linear address space corresponding to the kernel data segment. All the Task State Segments are sequentially stored in the init_tss array; in particular, the Base field of the TSS descriptor for the nth CPU points to the nth component of the init_tss array. The G (granularity) flag is cleared, while the Limit field is set to 0xeb, since the TSS segment is 236 bytes long. The Type field is set to 9 or 11 (available 32-bit TSS), and the DPL is set to 0, since processes in User Mode are not allowed to access TSS segments. You will find details on how Linux uses TSSs in Section 3.3.2.

·         A default Local Descriptor Table (LDT) that is usually shared by all processes. This segment is stored in the default_ldt variable. The default LDT includes a single entry consisting of a null Segment Descriptor. Each processor has its own LDT Segment Descriptor, which usually points to the common default LDT segment; its Base field is set to the address of default_ldt and its Limit field is set to 7. When a process requiring a nonempty LDT is running, the LDT descriptor in the GDT corresponding to the executing CPU is replaced by the descriptor associated with the LDT that was built by the process. You will find more details of this mechanism in Chapter 3.

·         Four segments related to the Advanced Power Management (APM) support. APM consists of a set of BIOS routines devoted to the management of the power states of the system. If the kernel supports APM, four entries in the GDT store the descriptors of two data segments and two code segments containing APM-related kernel functions.

Figure 2-5. The Global Descriptor Table

figs/ULK2_0205.gif

In conclusion, as shown in Figure 2-5, the GDT includes a set of common descriptors plus a pair of segment descriptors for each existing CPU — one for the TSS segment and one for the LDT segment. For efficiency, some entries in the GDT table are left unused, so that segment descriptors usually accessed together are kept in the same 32-byte line of the hardware cache (see Section 2.4.7 later in this chapter).

As stated earlier, the Current Privilege Level of the CPU indicates whether the processor is in User or Kernel Mode and is specified by the RPL field of the Segment Selector stored in the cs register. Whenever the CPL is changed, some segmentation registers must be correspondingly updated. For instance, when the CPL is equal to 3 (User Mode), the ds register must contain the Segment Selector of the user data segment, but when the CPL is equal to 0, the ds register must contain the Segment Selector of the kernel data segment.

A similar situation occurs for the ss register. It must refer to a User Mode stack inside the user data segment when the CPL is 3, and it must refer to a Kernel Mode stack inside the kernel data segment when the CPL is 0. When switching from User Mode to Kernel Mode, Linux always makes sure that the ss register contains the Segment Selector of the kernel data segment.