Of the six typical cases mentioned earlier in Section 8.1, in which a process gets new memory regions, the first one—issuing a fork( ) system call—requires the creation of a whole new address space for the child process. Conversely, when a process terminates, the kernel destroys its address space. In this section, we discuss how these two activities are performed by Linux.
In Section 3.4.1, we mentioned that the kernel invokes the copy_mm( ) function while creating a new process. This function creates the process address space by setting up all Page Tables and memory descriptors of the new process.
Each process usually has its own address space, but lightweight processes can be created by calling clone( ) with the CLONE_VM flag set. These processes share the same address space; that is, they are allowed to address the same set of pages.
Following the COW approach described earlier, traditional processes inherit the address space of their parent: pages stay shared as long as they are only read. When one of the processes attempts to write one of them, however, the page is duplicated; after some time, a forked process usually gets its own address space that is different from that of the parent process. Lightweight processes, on the other hand, use the address space of their parent process. Linux implements them simply by not duplicating address space. Lightweight processes can be created considerably faster than normal processes, and the sharing of pages can also be considered a benefit so long as the parent and children coordinate their accesses carefully.
If the new process has been created by means of the clone( ) system call and if the CLONE_VM flag of the flag parameter is set, copy_mm( ) gives the clone (tsk) the address space of its parent (current):
if (clone_flags & CLONE_VM) {
atomic_inc(¤t->mm->mm_users);
tsk->mm = current->mm;
tsk->active_mm = current->mm;
return 0;
}
If the CLONE_VM flag is not set, copy_mm( ) must create a new address space (even though no memory is allocated within that address space until the process requests an address). The function allocates a new memory descriptor, stores its address in the mm field of the new process descriptor tsk, and then initializes its fields:
tsk->mm = kmem_cache_alloc(mm_cachep, SLAB_KERNEL);
tsk->active_mm = tsk->mm;
memcpy(tsk->mm, current->mm, sizeof(*tsk->mm));
atomic_set(&tsk->mm->mm_users, 1);
atomic_set(&tsk->mm->mm_count, 1);
init_rwsem(&tsk->mm->mmap_sem);
tsk->mm->page_table_lock = SPIN_LOCK_UNLOCKED;
tsk->mm->pgd = pgd_alloc(tsk->mm);
Remember that the pgd_alloc( ) macro allocates a Page Global Directory for the new process.
The dup_mmap( ) function is then invoked to duplicate both the memory regions and the Page Tables of the parent process:
down_write(¤t->mm->mmap_sem);
dup_mmap(tsk->mm);
up_write(¤t->mm->mmap_sem);
copy_segments(tsk, tsk->mm);
The dup_mmap( ) function inserts the new memory descriptor tsk->mm in the global list of memory descriptors. Then it scans the list of regions owned by the parent process, starting from the one pointed by current->mm->mmap. It duplicates each vm_area_struct memory region descriptor encountered and inserts the copy in the list of regions owned by the child process.
Right after inserting a new memory region descriptor, dup_mmap( ) invokes copy_page_range( ) to create, if necessary, the Page Tables needed to map the group of pages included in the memory region and to initialize the new Page Table entries. In particular, any page frame corresponding to a private, writable page (VM_SHARE flag off and VM_MAYWRITE flag on) is marked as read-only for both the parent and the child, so that it will be handled with the Copy On Write mechanism. Before terminating, dup_mmap( ) also creates the red-black tree of memory regions of the child process by invoking the build_mmap_rb( ) function.
Finally, copy_mm( ) invokes copy_segments( ), which initializes the architecture-dependent portion of the child's memory descriptor. Essentially, if the parent has a custom LDT, a copy of it is also assigned to the child.
When a process terminates, the kernel invokes the exit_mm( ) function to release the address space owned by that process:
mm_release();
if (tsk->mm) {
atomic_inc(&tsk->mm->mm_count);
mm = tsk->mm;
tsk->mm = NULL;
enter_lazy_tlb(mm, current, smp_processor_id());
mmput(mm);
}
The mm_release( ) function wakes up any process sleeping in the tsk->vfork_done completion (see Section 5.3.8). Typically, the corresponding wait queue is nonempty only if the exiting process was created by means of the vfork( ) system call (see Section 3.4.1). The processor is also put in lazy TLB mode (see Chapter 2).