16.6 Swapping in Pages

Swap in must take place when a process attempts to address a page within its address space that has been swapped out to disk. The Page Fault exception handler triggers a swap-in operation when the following conditions occur (see Section 8.4.2):

·         The page including the address that caused the exception is a valid one—that is, it belongs to a memory region of the current process.

·         The page is not present in memory—that is, the Present flag in the Page Table entry is cleared.

·         The Page Table entry associated with the page is not null, which means it contains a swapped-out page identifier.

As described in Section 8.4.3, the handle_pte_fault( ) function, invoked by the do_page_fault( ) exception handler, checks whether the Page Table entry is non-null. If so, it invokes a quite handy do_swap_page( ) function to swap in the page required.

16.6.1 The do_swap_page( ) Function

This do_swap_page( ) function acts on the following parameters:

mm

Memory descriptor address of the process that caused the Page Fault exception

vma

Memory region descriptor address of the region that includes address

address

Linear address that causes the exception

page_table

Address of the Page Table entry that maps address

orig_pte

Content of the Page Table entry that maps address

write_access

Flag denoting whether the attempted access was a read or a write

Contrary to other functions, do_swap_page( ) never returns 0. It returns 1 if the page is already in the swap cache (minor fault), 2 if the page was read from the swap area (major fault), and -1 if an error occurred while performing the swap in. It essentially executes the following steps:

1.       Releases the page_table_lock spin lock of the memory descriptor (it was acquired by the caller function handle_pte_fault( )).

2.       Gets the swapped-out page identifier from orig_pte.

3.       Invokes lookup_swap_cache( ) to check whether the swap cache already contains a page corresponding to the swapped-out page identifier; if the page is already in the swap cache, it jumps to Step 6.

4.       Invokes the swapin_readahead( ) function to read from the swap area a group of at most 2n pages, including the requested one. The value n is stored in the page_cluster variable, and is usually equal to 3.[7] Each page is read by invoking the read_swap_cache_async( ) function.

[7] The system administrator may tune this value by writing into the /proc/sys/vm/page-cluster file. Swap-in read-ahead can be disabled by setting page_cluster to 0.

5.       Invokes read_swap_cache_async( ) once more to swap in precisely the page accessed by the process that caused the Page Fault. This step might appear redundant, but it isn't really. The swapin_readahead( ) function might fail in reading the requested page—for instance, because page_cluster is set to 0 or the function tried to read a group of pages including a defective page slot (SWAP_MAP_BAD). On the other hand, if swapin_readahead( ) succeeded, this invocation of read_swap_cache_async( ) terminates quickly because it finds the page in the swap cache.

6.       If, despite all efforts, the requested page was not added to the swap cache, another kernel control path might have already swapped in the requested page on behalf of a clone of this process. This case is checked by temporarily acquiring the page_table_lock spin lock and comparing the entry to which page_table points with orig_pte. If they differ, the page has already been swapped in by some other kernel thread, so the function returns 1 (minor fault); otherwise, it returns -1 (failure).

7.       At this point, we know that the page is in the swap cache. Invokes mark_page_accessed( ) (see the later section Section 16.7.2) and locks the page.

8.       Acquires the page_table_lock spin lock.

9.       Checks whether another kernel control path has swapped in the requested page on behalf of a clone of this process. In this case, releases the page_table_lock spin lock, unlocks the page, and returns 1 (minor fault).

10.   Invokes swap_free( ) to decrement the usage counter of the page slot corresponding to entry.

11.   Checks whether the swap cache is at least 50 percent full (nr_swap_pages is smaller than a half of total_swap_pages). If so, checks whether the page is owned only by the process that caused the fault (or one of its clones); if this is the case, removes the page from the swap cache.

12.   Increments the rss field of the process's memory descriptor.

13.   Unlocks the page.

14.   Updates the Page Table entry so the process can find the page. The function accomplishes this by writing the physical address of the requested page and the protection bits found in the vm_page_prot field of the memory region into the Page Table entry addressed by page_table. Moreover, if the access that caused the fault was a write and the faulting process is the unique owner of the page, the function also sets the Dirty flag and the Read/Write flag to prevent a useless Copy on Write fault.

15.   Releases the mm->page_table_lock spin lock and returns 1 (minor fault) or 2 (major fault).