Unix systems provide a family of functions that replace the execution context of a process with a new context described by an executable file. The names of these functions start with the prefix exec, followed by one or two letters; therefore, a generic function in the family is usually referred to as an exec function.
The exec functions are listed in Table 20-7; they differ in how the parameters are interpreted.
The first parameter of each function denotes the pathname of the file to be executed. The pathname can be absolute or relative to the process's current directory. Moreover, if the name does not include any / characters, the execlp( ) and execvp( ) functions search for the executable file in all directories specified by the PATH environment variable.
Besides the first parameter, the execl( ), execlp( ), and execle( ) functions include a variable number of additional parameters. Each points to a string describing a command-line argument for the new program; as the "l" character in the function names suggests, the parameters are organized in a list terminated by a NULL value. Usually, the first command-line argument duplicates the executable filename. Conversely, the execv( ), execvp( ), and execve( ) functions specify the command-line arguments with a single parameter; as the v character in the function names suggests, the parameter is the address of a vector of pointers to command-line argument strings. The last component of the array must be NULL.
The execle( ) and execve( ) functions receive as their last parameter the address of an array of pointers to environment strings; as usual, the last component of the array must be NULL. The other functions may access the environment for the new program from the external environ global variable, which is defined in the C library.
All exec functions, with the exception of execve( ), are wrapper routines defined in the C library and use execve( ), which is the only system call offered by Linux to deal with program execution.
The sys_execve( ) service routine receives the following parameters:
· The address of the executable file pathname (in the User Mode address space).
· The address of a NULL-terminated array (in the User Mode address space) of pointers to strings (again in the User Mode address space); each string represents a command-line argument.
· The address of a NULL-terminated array (in the User Mode address space) of pointers to strings (again in the User Mode address space); each string represents an environment variable in the NAME=value format.
The function copies the executable file pathname into a newly allocated page frame. It then invokes the do_execve( ) function, passing to it the pointers to the page frame, to the pointer's arrays, and to the location of the Kernel Mode stack where the User Mode register contents are saved. In turn, do_execve( ) performs the following operations:
1. Statically allocates a linux_binprm data structure, which will be filled with data concerning the new executable file.
2. Invokes path_init( ), path_walk( ), and dentry_open( ) to get the dentry object, the file object, and the inode object associated with the executable file. On failure, returns the proper error code.
3. Verifies that the executable file is not being written by checking the i_writecount field of the inode; stores -1 in that field to forbid further write accesses.
4. Invokes the prepare_binprm( ) function to fill the linux_binprm data structure. This function, in turn, performs the following operations:
a. Checks whether the permissions of the file allow its execution; if not, returns an error code.
b. Initializes the e_uid and e_gid fields of the linux_binprm structure, taking into account the values of the setuid and setgid flags of the executable file. These fields represent the effective user and group IDs, respectively. Also checks process capabilities (a compatibility hack explained in the earlier section Section 20.1.1).
c. Fills the buf field of the linux_binprm structure with the first 128 bytes of the executable file. These bytes include the magic number of the executable format and other information suitable for recognizing the executable file.
5. Copies the file pathname, command-line arguments, and environment strings into one or more newly allocated page frames. (Eventually, they are assigned to the User Mode address space.)
6. Invokes the search_binary_handler( ) function, which scans the formats list and tries to apply the load_binary method of each element, passing to it the linux_binprm data structure. The scan of the formats list terminates as soon as a load_binary method succeeds in acknowledging the executable format of the file.
7. If the executable file format is not present in the formats list, releases all allocated page frames and returns the error code -ENOEXEC. Linux cannot recognize the executable file format.
8. Otherwise, returns the code obtained from the load_binary method associated with the executable format of the file.
The load_binary method corresponding to an executable file format performs the following operations (we assume that the executable file is stored on a filesystem that allows file memory mapping and that it requires one or more shared libraries):
1. Checks some magic numbers stored in the first 128 bytes of the file to identify the executable format. If the magic numbers don't match, returns the error code -ENOEXEC.
2. Reads the header of the executable file. This header describes the program's segments and the shared libraries requested.
3. Gets from the executable file the pathname of the program interpreter, which is used to locate the shared libraries and map them into memory.
4. Gets the dentry object (as well as the inode object and the file object) of the program interpreter.
5. Checks the execution permissions of the program interpreter.
6. Copies the first 128 bytes of the program interpreter into a buffer.
7. Performs some consistency checks on the program interpreter type.
8. Invokes the flush_old_exec( ) function to release almost all resources used by the previous computation; in turn, this function performs the following operations:
a. If the table of signal handlers is shared with other processes, allocates a new table and decrements the usage counter of the old one; this is done by invoking the make_private_signals( ) function.
b. Invokes the exec_mmap( ) function to release the memory descriptor, all memory regions, and all page frames assigned to the process and to clean up the process's Page Tables.
c. Updates the table of signal handlers by resetting each signal to its default action. This is done by invoking the release_old_signals( ) and flush_signal_handlers( ) functions.
d. Sets the comm field of the process descriptor with the executable file pathname.
e. Invokes the flush_thread( ) function to clear the values of the floating point registers and debug registers saved in the TSS segment.
f. Invokes the de_thread( ) function to detach the process from the old thread group (see Section 3.2.2).
g. Invokes the flush_old_files( ) function to close all open files having the corresponding flag in the files->close_on_exec field of the process descriptor set (see Section 12.2.6).[6]
[6] These flags can be read and modified by means of the fcntl( ) system call.
9. Now we have reached the point of no return: the function cannot restore the previous computation if something goes wrong.
10. Sets up the new personality of the process—that is, the personality field in the process descriptor.
11. Clears the PF_FORKNOEXEC flag in the process descriptor. This flag, which is set when a process is forked and cleared when it executes a new program, is required for process accounting.
12. Invokes the setup_arg_pages( ) function to allocate a new memory region descriptor for the process's User Mode stack and to insert that memory region into the process's address space. setup_arg_pages( ) also assigns the page frames containing the command-line arguments and the environment variable strings to the new memory region.
13. Invokes the do_mmap( ) function to create a new memory region that maps the text segment (that is, the code) of the executable file. The initial linear address of the memory region depends on the executable format, since the program's executable code is usually not relocatable. Therefore, the function assumes that the text segment is loaded starting from some specific logical address offset (and thus from some specified linear address). ELF programs are loaded starting from linear address 0x08048000.
14. Invokes the do_mmap( ) function to create a new memory region that maps the data segment of the executable file. Again, the initial linear address of the memory region depends on the executable format, since the executable code expects to find its variables at specified offsets (that is, at specified linear addresses). In an ELF program, the data segment is loaded right after the text segment.
15. Allocates additional memory regions for any other specialized segments of the executable file. Usually, there are none.
16. Invokes a function that loads the program interpreter. If the program interpreter is an ELF executable, the function is named load_elf_interp( ). In general, the function performs the operations in Steps 11 through 13, but for the program interpreter instead of the file to be executed. The initial addresses of the memory regions that will include the text and data of the program interpreter are specified by the program interpreter itself; however, they are very high (usually above 0x40000000) to avoid collisions with the memory regions that map the text and data of the file to be executed (see the earlier section Section 20.1.4).
17. Stores in the binfmt field of the process descriptor the address of the linux_binfmt object of the executable format.
18. Determines the new capabilities of the process.
19. Creates specific program interpreter tables and stores them on the User Mode stack between the command-line arguments and the array of pointers to environment strings (see Figure 20-1).
20. Sets the values of the start_code, end_code, end_data, start_brk, brk, and start_stack fields of the process's memory descriptor.
21. Invokes the do_brk( ) function to create a new anonymous memory region mapping the bss segment of the program. (When the process writes into a variable, it triggers demand paging, and thus the allocation of a page frame.) The size of this memory region was computed when the executable program was linked. The initial linear address of the memory region must be specified, since the program's executable code is usually not relocatable. In an ELF program, the bss segment is loaded right after the data segment.
22. Invokes the start_thread( ) macro to modify the values of the User Mode registers eip and esp saved on the Kernel Mode stack, so that they point to the entry point of the program interpreter and to the top of the new User Mode stack, respectively.
23. If the process is being traced, sends the SIGTRAP signal to it.
24. Returns the value 0 (success).
When the execve( ) system call terminates and the calling process resumes its execution in User Mode, the execution context is dramatically changed: the code that invoked the system call no longer exists. In this sense, we could say that execve( ) never returns on success. Instead, a new program to be executed is mapped in the address space of the process.
However, the new program cannot yet be executed, since the program interpreter must still take care of loading the shared libraries.[7]
[7] Things are much simpler if the executable file is statically linked—that is, if no shared library is requested. The load_binary method just maps the text, data, bss, and stack segments of the program into the process memory regions, and then sets the User Mode eip register to the entry point of the new program.
Although the program interpreter runs in User Mode, we briefly sketch out here how it operates. Its first job is to set up a basic execution context for itself, starting from the information stored by the kernel in the User Mode stack between the array of pointers to environment strings and arg_start. Then the program interpreter must examine the program to be executed to identify which shared libraries must be loaded and which functions in each shared library are effectively requested. Next, the interpreter issues several mmap( ) system calls to create memory regions mapping the pages that will hold the library functions (text and data) actually used by the program. Then the interpreter updates all references to the symbols of the shared library, according to the linear addresses of the library's memory regions. Finally, the program interpreter terminates its execution by jumping to the main entry point of the program to be executed. From now on, the process will execute the code of the executable file and of the shared libraries.
As you may have noticed, executing a program is a complex activity that involves many facets of kernel design, such as process abstraction, memory management, system calls, and filesystems. It is the kind of topic that makes you realize what a marvelous piece of work Linux is!