All information related to the process address space is included in a data structure called a memory descriptor. This structure of type mm_struct is referenced by the mm field of the process descriptor. The fields of a memory descriptor are listed in Table 8-2.
Table 8-2. The fields of the memory descriptor |
||
Type |
Field |
Description |
struct vm_area_struct * |
mmap |
Pointer to the head of the list of memory region objects |
rb_root_t |
mm_rb |
Pointer to the root of the red-black tree of memory region objects |
struct vm_area_struct * |
mmap_cache |
Pointer to the last referenced memory region object |
pgd_t * |
pgd |
Pointer to the Page Global Directory |
atomic_t |
mm_users |
Secondary usage counter |
atomic_t |
mm_count |
Main usage counter |
int |
map_count |
Number of memory regions |
struct rw_semaphore |
mmap_sem |
Memory regions' read/write semaphore |
spinlock_t |
page_table_lock |
Memory regions' and Page Tables' spin lock |
struct list_head |
mmlist |
Pointers to adjacent elements in the list of memory descriptors |
unsigned long |
start_code |
Initial address of executable code |
unsigned long |
end_code |
Final address of executable code |
unsigned long |
start_data |
Initial address of initialized data |
unsigned long |
end_data |
Final address of initialized data |
unsigned long |
start_brk |
Initial address of the heap |
unsigned long |
brk |
Current final address of the heap |
unsigned long |
start_stack |
Initial address of User Mode stack |
unsigned long |
arg_start |
Initial address of command-line arguments |
unsigned long |
arg_end |
Final address of command-line arguments |
unsigned long |
env_start |
Initial address of environment variables |
unsigned long |
env_end |
Final address of environment variables |
unsigned long |
rss |
Number of page frames allocated to the process |
unsigned long |
total_vm |
Size of the process address space (number of pages) |
unsigned long |
locked_vm |
Number of "locked" pages that cannot be swapped out (see Chapter 16) |
unsigned long |
def_flags |
Default access flags of the memory regions |
unsigned long |
cpu_vm_mask |
Bit mask for lazy TLB switches (see Chapter 2) |
unsigned long |
swap_address |
Last scanned linear address for swapping (see Chapter 16) |
unsigned int |
dumpable |
Flag that specifies whether the process can produce a core dump of the memory |
mm_context_t |
context |
Pointer to table for architecture-specific information (e.g., LDT's address in 80 x 86 platforms) |
All memory descriptors are stored in a doubly linked list. Each descriptor stores the address of the adjacent list items in the mmlist field. The first element of the list is the mmlist field of init_mm, the memory descriptor used by process 0 in the initialization phase. The list is protected against concurrent accesses in multiprocessor systems by the mmlist_lock spin lock. The number of memory descriptors in the list is stored in the mmlist_nr variable.
The mm_users field stores the number of lightweight processes that share the mm_struct data structure (see Section 3.4.1). The mm_count field is the main usage counter of the memory descriptor; all "users" in mm_users count as one unit in mm_count. Every time the mm_count field is decremented, the kernel checks whether it becomes zero; if so, the memory descriptor is deallocated because it is no longer in use.
We'll try to explain the difference between the use of mm_users and mm_count with an example. Consider a memory descriptor shared by two lightweight processes. Normally, its mm_users field stores the value 2, while its mm_count field stores the value 1 (both owner processes count as one).
If the memory descriptor is temporarily lent to a kernel thread (see the next section), the kernel increments the mm_count field. In this way, even if both lightweight processes die and the mm_users field becomes zero, the memory descriptor is not released until the kernel thread finishes using it because the mm_count field remains greater than zero.
If the kernel wants to be sure that the memory descriptor is not released in the middle of a lengthy operation, it might increment the mm_users field instead of mm_count (this is what the swap_out( ) function does; see Section 16.5). The final result is the same because the increment of mm_users ensures that mm_count does not become zero even if all lightweight processes that own the memory descriptor die.
The mm_alloc( ) function is invoked to get a new memory descriptor. Since these descriptors are stored in a slab allocator cache, mm_alloc( ) calls kmem_cache_alloc( ), initializes the new memory descriptor, and sets the mm_count and mm_users field to 1.
Conversely, the mmput( ) function decrements the mm_users field of a memory descriptor. If that field becomes 0, the function releases the Local Descriptor Table, the memory region descriptors (see later in this chapter), and the Page Tables referenced by the memory descriptor, and then invokes mmdrop( ). The latter function decrements mm_count and, if it becomes zero, releases the mm_struct data structure.
The mmap, mm_rb, mmlist, and mmap_cache fields are discussed in the next section.
Kernel threads run only in Kernel Mode, so they never access linear addresses below TASK_SIZE (same as PAGE_OFFSET, usually 0xc0000000). Contrary to regular processes, kernel threads do not use memory regions, therefore most of the fields of a memory descriptor are meaningless for them.
Since the Page Table entries that refer to the linear address above TASK_SIZE should always be identical, it does not really matter what set of Page Tables a kernel thread uses. To avoid useless TLB and cache flushes, kernel threads use the Page Tables of a regular process in Linux 2.4. To that end, two kinds of memory descriptor pointers are included in every memory descriptor: mm and active_mm.
The mm field in the process descriptor points to the memory descriptor owned by the process, while the active_mm field points to the memory descriptor used by the process when it is in execution. For regular processes, the two fields store the same pointer. Kernel threads, however, do not own any memory descriptor, thus their mm field is always NULL. When a kernel thread is selected for execution, its active_mm field is initialized to the value of the active_mm of the previously running process (see Section 11.2.2.3).
There is, however, a small complication. Whenever a process in Kernel Mode modifies a Page Table entry for a "high" linear address (above TASK_SIZE), it should also update the corresponding entry in the sets of Page Tables of all processes in the system. In fact, once set by a process in Kernel Mode, the mapping should be effective for all other processes in Kernel Mode as well. Touching the sets of Page Tables of all processes is a costly operation; therefore, Linux adopts a deferred approach.
We already mentioned this deferred approach in Section 7.3: every time a high linear address has to be remapped (typically by vmalloc( ) or vfree( )), the kernel updates a canonical set of Page Tables rooted at the swapper_pg_dir master kernel Page Global Directory (see Section 2.5.5). This Page Global Directory is pointed to by the pgd field of a master memory descriptor, which is stored in the init_mm variable.[1]
[1] We mentioned in Section 3.4.2 that the swapper kernel thread uses init_mm during the initialization phase. However, swapper never uses this memory descriptor once the initialization phase completes.
Later, in Section 8.4.5, we'll describe how the Page Fault handler takes care of spreading the information stored in the canonical Page Tables when effectively needed.