17.3 Ext2 Memory Data Structures

For the sake of efficiency, most information stored in the disk data structures of an Ext2 partition are copied into RAM when the filesystem is mounted, thus allowing the kernel to avoid many subsequent disk read operations. To get an idea of how often some data structures change, consider some fundamental operations:

·         When a new file is created, the values of the s_free_inodes_count field in the Ext2 superblock and of the bg_free_inodes_count field in the proper group descriptor must be decremented.

·         If the kernel appends some data to an existing file so that the number of data blocks allocated for it increases, the values of the s_free_blocks_count field in the Ext2 superblock and of the bg_free_blocks_count field in the group descriptor must be modified.

·         Even just rewriting a portion of an existing file involves an update of the s_wtime field of the Ext2 superblock.

Since all Ext2 disk data structures are stored in blocks of the Ext2 partition, the kernel uses the buffer cache and the page cache to keep them up to date (see Section 14.2.4).

Table 17-6 specifies, for each type of data related to Ext2 filesystems and files, the data structure used on the disk to represent its data, the data structure used by the kernel in memory, and a rule of thumb used to determine how much caching is used. Data that is updated very frequently is always cached; that is, the data is permanently stored in memory and included in the buffer cache or in the page cache until the corresponding Ext2 partition is unmounted. The kernel gets this result by keeping the buffer's usage counter greater than 0 at all times.

Table 17-6. VFS images of Ext2 data structures

Type

Disk data structure

Memory data structure

Caching mode

Superblock

ext2_super_block

ext2_sb_info

Always cached

Group descriptor

ext2_group_desc

ext2_group_desc

Always cached

Block bitmap

Bit array in block

Bit array in buffer

Fixed limit

Inode bitmap

Bit array in block

Bit array in buffer

Fixed limit

Inode

ext2_inode

ext2_inode_info

Dynamic

Data block

Unspecified

Buffer page

Dynamic

Free inode

ext2_inode

None

Never

Free block

Unspecified

None

Never

The never-cached data is not kept in any cache since it does not represent meaningful information.

In between these extremes lie two other modes: fixed-limit and dynamic. In the fixed-limit mode, a specific number of data structures can be kept in the buffer cache; older ones are flushed to disk when the number is exceeded. In the dynamic mode, the data is kept in a cache as long as the associated object (an inode or data block) is in use; when the file is closed or the data block is deleted, the shrink_mmap( ) function may remove the associated data from the cache and write it back to disk.

17.3.1 The ext2_sb_info and ext2_inode_info Structures

When an Ext2 filesystem is mounted, the u field of the VFS superblock, which contains filesystem-specific data, is loaded with a structure of type ext2_sb_info so that the kernel can find out things related to the filesystem as a whole. This structure includes the following information:

·         Most of the disk superblock fields

·         The block bitmap cache, tracked by the s_block_bitmap and s_block_bitmap_number arrays (see the next section)

·         The inode bitmap cache, tracked by the s_inode_bitmap and s_inode_bitmap_number arrays (see the next section)

·         An s_sbh pointer to the buffer head of the buffer containing the disk superblock

·         An s_es pointer to the buffer containing the disk superblock

·         The number of group descriptors, s_desc_ per_block, that can be packed in a block

·         An s_group_desc pointer to an array of buffer heads of buffers containing the group descriptors (usually, a single entry is sufficient)

·         Other data related to mount state, mount options, and so on

Similarly, when an inode object pertaining to an Ext2 file is initialized, the u field is loaded with a structure of type ext2_inode_info, which includes this information:

·         Most of the fields found in the disk's inode structure that are not kept in the generic VFS inode object (see Table 12-3 in Chapter 12)

·         The fragment size and the fragment number (not yet used)

·         The block_group block group index at which the inode belongs (see Section 17.2 earlier in this chapter)

·         The i_prealloc_block and i_prealloc_count fields, which are used for data block preallocation (see Section 17.6.5 later in this chapter)

·         The i_osync field, which is a flag specifying whether the disk inode should be synchronously updated

17.3.2 Bitmap Caches

When the kernel mounts an Ext2 filesystem, it allocates a buffer for the Ext2 disk superblock and reads its contents from disk. The buffer is released only when the Ext2 filesystem is unmounted. When the kernel must modify a field in the Ext2 superblock, it simply writes the new value in the proper position of the corresponding buffer and then marks the buffer as dirty.

Unfortunately, this approach cannot be adopted for all Ext2 disk data structures. The tenfold increase in disk capacity reached in recent years has induced a tenfold increase in the size of inode and data block bitmaps, so we have reached the point at which it is no longer convenient to keep all the bitmaps in RAM at the same time.

For instance, consider a 4-GB disk with a 1-KB block size. Since each bitmap fills all the bits of a single block, each of them describes the status of 8,192 blocks — that is, of 8 MB of disk storage. The number of block groups is 4,096 MB/8 MB=512. Since each block group requires both an inode bitmap and a data block bitmap, 1 MB of RAM would be required to store all 1,024 bitmaps in memory.

The solution adopted to limit the memory requirements of the Ext2 descriptors is to use, for any mounted Ext2 filesystem, two caches of size EXT2_MAX_GROUP_LOADED (usually 8). One cache stores the most recently accessed inode bitmaps, while the other cache stores the most recently accessed block bitmaps. Buffers that contain bitmaps included in a cache have a usage counter greater than 0, therefore they are never freed by shrink_mmap( ) (see Section 16.7). Conversely, buffers that contain bitmaps not included in a bitmap cache have a null usage counter, so they can be freed if free memory becomes scarce.

Each cache is implemented by means of two arrays of EXT2_MAX_GROUP_LOADED elements. One array contains the indexes of the block groups whose bitmaps are currently in the cache, while the other array contains pointers to the buffer heads that refer to those bitmaps.

The ext2_sb_info structure stores the arrays pertaining to the inode bitmap cache; indexes of block groups are found in the s_inode_bitmap field and pointers to buffer heads are found in the s_inode_bitmap_number field. The corresponding arrays for the block bitmap cache are stored in the s_block_bitmap and s_block_bitmapnumber fields.

The load_inode_bitmap( ) function loads the inode bitmap of a specified block group and returns the cache position in which the bitmap can be found.

If the bitmap is not already in the bitmap cache, load_inode_bitmap( ) invokes read_inode_bitmap( ). The latter function gets the number of the block containing the bitmap from the bg_inode_bitmap field of the group descriptor, and then invokes bread( ) to allocate a new buffer and read the block from disk if it is not already included in the buffer cache.

If the number of block groups in the Ext2 partition is less than or equal to EXT2_MAX_GROUP_LOADED, the index of the cache array position in which the bitmap is inserted always matches the block group index passed as the parameter to the load_inode_bitmap( ) function.

Otherwise, if there are more block groups than cache positions, a bitmap is removed from the cache, if necessary, by using a Least Recently Used (LRU) policy, and the requested bitmap is inserted in the first cache position. Figure 17-3 illustrates the three possible cases in which the bitmap in block group 5 is referenced: where the requested bitmap is already in cache, where the bitmap is not in cache but there is a free position, and where the bitmap is not in cache and there is no free position.

Figure 17-3. Adding a bitmap to the cache

figs/ULK2_1703.gif

The load_block_bitmap( ) and read_block_bitmap( ) functions are very similar to load_inode_bitmap( ) and read_inode_bitmap( ), but they refer to the block bitmap cache of an Ext2 partition.

Figure 17-4 illustrates the memory data structures of a mounted Ext2 filesystem. In our example, there are three block groups whose descriptors are stored in three blocks on disk; therefore, the s_group_desc field of the ext2_sb_info points to an array of three buffer heads. We have shown just one inode bitmap having index 2 and one block bitmap having index 4, although the kernel may keep 2 x EXT2_MAX_GROUP_LOADED bitmaps in the bitmap caches, and even more may be stored in the buffer cache.

Figure 17-4. Ext2 memory data structures

figs/ULK2_1704.gif