Transferring swap pages wouldn't be so complicated if there weren't so many race conditions and other potential hazards to guard against. Here are some of the things that have to be checked regularly:
· The process that owns a page may terminate while the page is being swapped in or out.
· Another process may be in the middle of swapping in a page that the current one is trying to swap out (or vice versa).
Like any other disk access type, I/O data transfers for swap pages are blocking operations. Therefore, the kernel must take care to avoid simultaneous transfers involving the same page frame, the same page slot, or both.
Race conditions can be avoided on the page frame through the mechanisms discussed in Chapter 13. Specifically, before starting an I/O operation on the page frame, the kernel waits until its PG_locked flag is off. When the function returns, the page frame lock has been acquired, and therefore no other kernel control path can access the page frame's contents during the I/O operation.
But the state of the page slot must also be tracked. The PG_locked flag of the page descriptor is used once again to ensure exclusive access to the page slot involved in the I/O data transfer. Before starting an I/O operation on a swap page, the kernel checks that the page frame involved is included in the swap cache; if not, it adds the page frame into the swap cache. Let's suppose some process tries to swap in a page while the same page is currently being transferred. Before doing any work related to the swap in, the kernel looks in the swap cache for a page frame associated with the given swapped-out page identifier. Since the page frame is found, the kernel knows that it must not allocate a new page frame, but must simply use the cached page frame. Moreover, since the PG_locked flag is set, the kernel suspends the kernel control path until the bit becomes 0, so that both the page frame's contents and the page slot in the swap area are preserved until the I/O operation terminates.
In short, thanks to the swap cache, the PG_locked flag of the page frame also acts as a lock for the page slot in the swap area.
The rw_swap_page( ) function is used to swap in or swap out a page. It receives the following parameters:
rw
A flag specifying the direction of data transfer: READ for swapping in, WRITE for swapping out.
page
The address of a descriptor of a page in the swap cache.
Before invoking the function, the caller must ensure that the page is included in the swap cache and lock the page to prevent race conditions due to concurrent accesses to the page frame or to the page slot in the swap area, as described in the previous section. To be on the safe side, the rw_swap_page( ) function checks that these two conditions effectively hold, and then gets the swapped-out page identifier from page->index and invokes the rw_swap_page_base( ) function, passing to it the page identifier, the page descriptor address page, and the direction flag rw.
The rw_swap_page_base( ) function is the core of the swapping algorithm; it performs the following steps:
1. If the data transfer is for a swap-in operation (rw set to READ), it clears the PG_uptodate flag of the page frame. The flag is set again only if the swap-in operation terminates successfully.
2. Gets the proper swap area descriptor and the slot index from the swapped-out page identifier.
3. If the swap area is a disk partition, gets the corresponding block device number from the swap_device field of the swap area descriptor. In this case, the slot index also represents the logical block number of the requested data because the block size of any swap disk partition is always equal to the page size (PAGE_SIZE).
4. Otherwise, if the swap area is a regular file, it executes the following substeps:
a. Gets the number of the block device that stores the file from the i_dev field of its inode object (the swap_files->d_inode field in the swap area descriptor).
b. Gets the block size of the device (the i_sb->s_blocksize field of the inode).
c. Computes the file block number corresponding to the given slot index.
d. Fills a local array with the logical block numbers of the blocks in the page slot; every logical block number is obtained by invoking the bmap method of the address_space object whose address is stored in the i_mapping field of the inode. If the bmap method fails, rw_swap_page_base( ) returns 0 (failure).
5. Invokes the brw_page( ) function to start a page I/O operation on the block (or blocks) identified in the previous steps and returns 1 (success).
Since the page I/O operation activated by brw_page( ) is asynchronous, the rw_swap_page( ) function might terminate before the actual I/O data transfer completes. However, as described in Section 13.4.8.2, the kernel eventually executes the end_buffer_io_async( ) function (which verifies that all data transfers successfully completed), unlocks the page, and sets its PG_uptodate flag.
The read_swap_cache_async( ) function, which receives as a parameter a swapped-out page identifier, is invoked whenever the kernel must swap in a page. As we know, before accessing the swap partition, the function must check whether the swap cache already includes the desired page frame. Therefore, the function essentially executes the following operations:
1. Invokes find_get_page( ) to search for the page in the swap cache. If the page is found, it returns the address of its descriptor.
2. The page is not included in the swap cache. Invokes alloc_page( ) to allocate a new page frame. If no free page frame is available, it returns 0 (indicating the system is out of memory).
3. Invokes add_to_swap_cache( ) to insert the new page frame into the swap cache. As mentioned in the earlier section Section 16.3.1, this function also locks the page.
4. The previous step might fail if add_to_swap_cache( ) finds a duplicate of the page in the swap cache. For instance, the process could block in Step 2, thus allowing another process to start a swap-in operation on the same page slot. In this case, the function releases the page frame allocated in Step 3 and restarts from Step 1.
5. Otherwise, the new page frame is inserted into the swap cache. Invokes rw_swap_page( ) to read the page's contents from the swap area, passing the READ parameter and the page descriptor to that function.
6. Returns the address of the page descriptor.
There is just one case in which the kernel wants to read a page from a swap area without putting it in the swap cache. This happens when servicing the swapon( ) system call: the kernel reads the first page of a swap area, which contains the swap_header union, and then immediately discards the page frame. Since the kernel is activating the swap area, no process can swap in or swap out a page on it, so there is no need to protect the access to the page slot.
The rw_swap_page_nolock ( ) function receives as parameters the type of I/O operation (READ or WRITE), a swapped-out page identifier, and the address of a page frame (already locked). It performs the following operations:
1. Gets the page descriptor of the page frame passed as a parameter.
2. Initializes the swapping field of the page descriptor with the address of the swapper_space object; this is done because the sync_page method is executed in Step 4.
3. Invokes rw_swap_page_base( ) to start the I/O swap operation.
4. Waits until the I/O data transfer completes by invoking wait_on_page( ).
5. Unlocks the page.
6. Sets the mapping field of the page descriptor to NULL and returns.