As mentioned in Chapter 1, Unix-like operating systems are based on the notion of a file, which is just an information container structured as a sequence of characters. According to this approach, I/O devices are treated as files; thus, the same system calls used to interact with regular files on disk can be used to directly interact with I/O devices. For example, the same write( ) system call may be used to write data into a regular file or to send it to a printer by writing to the /dev/lp0 device file.
According to the characteristics of the underlying device drivers, device files can be of two types: block or character. The difference between the two classes of hardware devices is not so clear cut. At least we can assume the following:
· The data of a block device can be addressed randomly, and the time needed to transfer any data block is small and roughly the same, at least from the point of view of the human user. Typical examples of block devices are hard disks, floppy disks, CD-ROM, and DVD players.
· The data of a character device either cannot be addressed randomly (consider, for instance, a sound card), or they can be addressed randomly, but the time required to access a random datum largely depends on its position inside the device (consider, for instance, a magnetic tape driver).
Network cards are a remarkable exception to this schema, since they are hardware devices that are not directly associated with files; we describe them in Chapter 18.
In Linux 2.4, there are two different kinds of device files: old-style device files, which are real files stored in the system's directory tree, and devfs device files, which are virtual files like those of the /proc filesystem. Let's now discuss both types of device files in more detail.
Old-style device files have been in use since the early versions of the Unix operating system. An old-style device file is a real file stored in a filesystem. Its inode, however, doesn't address blocks of data on the disk. Instead, the inode includes an identifier of a hardware device. Besides its name and type (either character or block, as already mentioned), each device file has two main attributes:
A number ranging from 1 to 254 that identifies the device type. Usually, all device files that have the same major number and the same type share the same set of file operations, since they are handled by the same device driver.
Minor number
A number that identifies a specific device among a group of devices that share the same major number.
The mknod( ) system call is used to create old-style device files. It receives the name of the device file, its type, and the major and minor numbers as parameters. The last two parameters are merged in a 16-bit dev_t number; the eight most significant bits identify the major number, while the remaining ones identify the minor number. The MAJOR and MINOR macros extract the two values from the 16-bit number, while the MKDEV macro merges a major and minor number into a 16-bit number. Actually, dev_t is the data type specifically used by application programs; the kernel uses the kdev_t data type. In Linux 2.4, both types reduce to an unsigned short integer, but kdev_t will become a complete device file descriptor in some future Linux version.
The major and minor numbers are stored in the i_rdev field of the inode object. The type of device file (character or block) is stored in the i_mode field.
Device files are usually included in the /dev directory. Table 13-2 illustrates the attributes of some device files.[2] Notice that character and block devices have independent numbering, so block device (3,0) is unique from character device (3,0).
[2] The official registry of allocated device numbers and /dev directory nodes is stored in the Documentation/devices.txt file. The major numbers of the devices supported may also be found in the include/linux/major.h file.
Usually, a device file is associated with a hardware device (like a hard disk—for instance, /dev/hda) or with some physical or logical portion of a hardware device (like a disk partition—for instance, /dev/hda2). In some cases, however, a device file is not associated with any real hardware device, but represents a fictitious logical device. For instance, /dev/null is a device file corresponding to a "black hole"; all data written into it is simply discarded, and the file always appears empty.
As far as the kernel is concerned, the name of the device file is irrelevant. If you create a device file named /tmp/disk of type "block" with the major number 3 and minor number 0, it would be equivalent to the /dev/hda device file shown in the table. On the other hand, device filenames may be significant for some application programs. For example, a communication program might assume that the first serial port is associated with the /dev/ttyS0 device file. But most application programs can be configured to interact with arbitrarily named device files.
Identifying I/O devices by means of major and minor numbers has some limitations:
1. Most of the devices present in a /dev directory don't exist; the device files have been included so that the system administrator doesn't need to create a device file before installing a new I/O driver. However, a typical /dev directory, which includes over 1,800 device files, increases the time taken to look up an inode when first referenced.
2. The major and minor numbers are 8-bit long. Nowadays, this is a limiting factor for several hardware devices. For instance, it poses problems when identifying SCSI devices included in very large systems (the Linux workaround consists of allocating several major numbers to the SCSI disk drive; as a result, the kernel supports up to 128 SCSI disks).
The devfs device files have been introduced to solve these problems and other minor issues. However, at the time of this writing they are still not widely adopted; thus, we limit ourselves to sketch the main ideas behind it without describing the code.
The devfs virtual filesystem allows drivers to register devices by name rather than by major and minor numbers. The kernel provides a default naming scheme designed to make it easy to search for specific devices. For example, all disk devices are placed under the /dev/discs virtual directory; /dev/hda might become /dev/discs/disc0, /dev/hdb might become /dev/discs/disc1, and so on. Users can still refer to the old name scheme by properly configuring a device management daemon.
I/O drivers that use the devfs filesystem register devices by invoking devfs_register( ). Such function creates a new devfs_entry structure that includes the device file name and a pointer to a table of device driver methods. A registered device file automatically appears in a devfs virtual directory. The inode object of a device file in this directory is created only when the file is accessed.[3]
[3] The devfs filesystem is a virtual filesystem, similar to the /proc filesystem. It does not manage disk space: inode objects are created in RAM when needed and do not have a corresponding disk inode.
Opening a device file is also slightly more efficient because dentry objects of devfs files include pointers to the proper file operations (see Section 13.3.4 later in this chapter).
There are, however, some problems with devfs. The most important one is that major and minor numbers are somewhat indispensable for Unix systems. First, some User Mode applications like the NFS server or the find command rely on the major and minor numbers to identify the physical disk partition containing a given file. Second, device numbers are required even by the POSIX standard. Thus, the devfs layer lets the kernel define major and minor numbers for each device driver, like the old-style device files. Currently, almost all device drivers associate the devfs device file with the same major and minor numbers of the corresponding old-style device file. For this reason, we mainly focus on old-style device files in the rest of this chapter.
Device files live in the system directory tree but are intrinsically different from regular files and directories. When a process accesses a regular file, it is accessing some data blocks in some disk partition through a filesystem; when a process accesses a device file, it is just driving a hardware device. For instance, a process might access a device file to read the room temperature from a digital thermometer connected to the computer. It is the VFS's responsibility to hide the differences between device files and regular files from application programs.
To do this, the VFS changes the default file operations of a device file when it is opened; as a result, each system call on the device file is translated to an invocation of a device-related function instead of the corresponding function of the hosting filesystem. The device-related function acts on the hardware device to perform the operation requested by the process.[4]
[4] Notice that, thanks to the name-resolving mechanism explained in Section 12.5, symbolic links to device files work just like device files.
Let's suppose that a process executes an open( ) system call on a device file (either of type block or character). The operations performed by the system call have already been described in Section 12.6.1. Essentially, the corresponding service routine resolves the pathname to the device file and sets up the corresponding inode object, dentry object, and file object.
Assuming that the device file is old-style, the inode object is initialized by reading the corresponding inode on disk through a suitable function of the filesystem (usually ext2_read_inode( ); see Chapter 17). When this function determines that the disk inode is relative to a device file, it invokes init_special_inode( ), which initializes the i_rdev field of the inode object to the major and minor numbers of the device file, and sets the i_fop field of the inode object to the address of either the def_blk_fops table or the def_chr_fops table, according to the type of device file. The service routine of the open( ) system call also invokes the dentry_open( ) function, which allocates a new file object and sets its f_op field to the address stored in i_fop—that is, to the address of def_blk_fops or def_chr_fops once again. The contents of these two tables are shown in the later sections Section 13.4.5.2 and Section 13.5; thanks to them, any system call issued on a device file will activate a device driver's function rather than a function of the underlying filesystem.[5]
[5] If the device file is a virtual file in the devfs filesystem, the mechanism is slightly different: the devfs filesystem layer does not invoke init_special_inode( ); rather, any devfs device file has a custom open method (the devfs_open( ) function), which is invoked by dentry_open( ). It is the job of devfs_open( ) function to rewrite the f_op field of the file object in such a way to customize the operations triggered by the system calls.