Book: LPI Linux Certification in a Nutshell
Section: Chapter 3.  GNU and Unix Commands (Topic 1.3)



3.4 Objective 4: Use Unix Streams, Pipes,and Redirects

Among the many beauties of the Linux and Unix systems is the notion that everything is a file. Things such as disk drives and their partitions, tape drives, terminals, serial ports, the mouse, and even audio are mapped into the filesystem. This mapping allows programs to interact with many different devices and files in the same way, simplifying their interfaces. Each device that uses the file metaphor is given a device file, which is a special object in the filesystem that provides an interface to the device. The kernel associates device drivers with various device files, which is how the system manages the illusion that devices can be accessed as if they were files. Using a terminal as an example, a program reading from the terminal's device file will receive characters typed at the keyboard. Writing to the terminal causes characters to appear on the screen. While it may seem odd to think of your terminal as a file, the concept provides a unifying simplicity to Linux and Linux programming.

3.4.1 Standard I/O and Default File Descriptors

Standard I/O is a capability of the shell, used with all text-based Linux utilities to control and direct program input, output, and error information. When a program is launched, it is automatically provided with three file descriptors. File descriptors are regularly used in programming and serve as a "handle" of sorts to another file. Standard I/O creates the following file descriptors:

Standard input (abbreviated stdin)

This file descriptor is a text input stream. By default it is attached to your keyboard. When you type characters into an interactive text program, you are feeding them to standard input. As you've seen, some programs take one or more filenames as command-line arguments and ignore standard input. Standard input is also known as file descriptor 0.

Standard output (abbreviated stdout)

This file descriptor is a text output stream for normal program output. By default it is attached to your terminal (or terminal window). Output generated by commands is written to standard output for display. Standard output is also known as file descriptor 1.

Standard error (abbreviated stderr)

This file descriptor is also a text output stream, but it is used exclusively for errors or other information unrelated to the successful results of your command. By default standard error is attached to your terminal just like standard output. This means that standard output and standard error are commingled in your display, which can be confusing. You'll see ways to handle this later. Standard error is also known as file descriptor 2.

Standard output and standard error are separated because it is often useful to process normal program output differently than errors.

The standard I/O file descriptors are used in the same way as those created during program execution to read and write disk files. They enable you to tie commands together with files and devices, managing command input and output in exactly the way you desire. The difference is they are provided to the program by the shell by default and do not need to be explicitly created.

3.4.2 Pipes

From a program's point of view there is no difference between reading text data from a file and reading it from your keyboard. Similarly, writing text to a file and writing text to a display are equivalent operations. As an extension of this idea, it is also possible to tie the output of one program to the input of another. This is accomplished using a pipe (|) to join two or more commands together. For example:

$ grep "01523" order* | less

This command searches through all files whose names begin with order to find lines containing the word 01523. By creating this pipe, the standard output of grep is sent to the standard input of less. The mechanics of this operation are handled by the shell and are invisible to the user. Pipes can be used in a series of many commands. When more than two commands are put together, the resulting operation is known as a pipeline or text stream, implying the flow of text from one command to the next.

As you get used to the idea, you'll find yourself building pipelines naturally to extract specific information from text data sources. For example, suppose you wish to view a sorted list of inode numbers from among the files in your current directory. There are many ways you could achieve this. One way would be to use awk in a pipeline to extract the inode number from the output of ls, then send it on to the sort command and finally to a pager for viewing:[14]

[14] Don't worry about the syntax or function of these commands at this point.

$ ls -i * | awk '{print $1}' | sort -nu | less

The pipeline concept in particular is a feature of Linux and Unix that draws on the fact that your system contains a diverse set of tools for operating on text. Combining their capabilities can yield quick and easy ways to extract otherwise hard to handle information.

3.4.3 Redirection

Each pipe symbol in the previous pipelines example instructs the shell to feed output from one command into the input of another. This action is a special form of redirection, which allows you to manage the origin of input streams and the destination of output streams. In the previous example, individual programs are unaware that their output is being handed off to or from another program because the shell takes care of the redirection on their behalf.

Redirection can also occur to and from files. For example, rather than sending the output of an inode list to the pager less, it could easily be sent directly to a file with the > redirection operator:

$ ls -i * | awk '{print $1}' | sort -nu > in.txt

By changing the last redirection operator, the shell creates an empty file (in.txt), opens it for writing, and the standard output of sort places the results in the file instead of on the screen. Note that, in this example, anything sent to standard error is still displayed on the screen.

Since the > redirection operator creates files, the >> redirection operator can be used to append to existing files.For example, you could use the following command to append a one-line footnote to in.txt:

$ echo "end of list" >> in.txt

Since in.txt already exists, the quote will be appended to the bottom of the existing file. If the file didn't exist, the >> operator would create the file and insert the text "end of list" as its contents.

It is important to note that when creating files, the output redirection operators are interpreted by the shell before the commands are executed. This means that any output files created through redirection are opened first. For this reason, you cannot modify a file in place, like this:

$ grep "stuff" file1 > file1  # don't do it!

If file1 contains something of importance, this command would be a disaster because an empty file1 would overwrite the original. The grep command would be last to execute, resulting in a complete data loss from the original file1 file because the file that replaced it was empty. To avoid this problem, simply use an intermediate file and then rename it:

$ grep "stuff" file1 > file2
$ mv file2 file1

Standard input can also be redirected. The input redirection operator is <. Using a source other than the keyboard for a program's input may seem odd at first, but since text programs don't care about where their standard input streams originate, you can easily redirect input. For example, the following command will send a mail message with the contents of the file in.txt to user jdean:

$ Mail -s "inode list" jdean < in.txt

Normally, the Mail program prompts the user for input at the terminal. However with standard input redirected from the file in.txt, no user input is needed and the command executes silently. Table 3-4 lists the common standard I/O redirections for the bash shell, specified in the LPI Objectives.

The redirection syntax may be significantly different if you use another shell.

Table 3-4. Standard I/O Redirections for the bash shell

Redirection Function

Syntax for bash

Send stdout to file.

$ cmd > file
$ cmd 1> file

Send stderr to file.

$ cmd 2> file

Send both stdout and stderr to file.

$ cmd > file 2>&1

Send stdout to file1 and stderr to file2.

$ cmd > file1 2> file2

Receive stdin from file.

$ cmd < file

Append stdout to file.

$ cmd >> file
$ cmd 1>> file

Append stderr to file.

$ cmd 2>> file

Append both stdout and stderr to file.

$ cmd >> file 2>&1

Pipe stdout from cmd1 to cmd2.

$ cmd1 | cmd2

Pipe stdout and stderrfrom cmd1 to cmd2.

$ cmd1 2>&1 | cmd2

On the Exam

Be prepared to demonstrate the difference between filenames and command names in commands using redirection operators. Also, check the syntax on commands in redirection questions to be sure about which command or file is a data source and which is a destination.

3.4.4 Using the tee Command

Sometimes, you'll want to run a program and send its output to a file while at the same time viewing the output on the screen. The tee utility is helpful in this situation.

tee

Syntax

tee [options] files

Description

Read from standard input and write both to one or more files and to standard output (analogous to a tee junction in a pipe).

Option

-a

Append to files rather than overwriting them.

Example

Suppose you're running a pipeline of commands cmd1, cmd2, and cmd3:

$ cmd1 | cmd2 | cmd3 > file1

This sequence puts the ultimate output of the pipeline into file1. However, you may also be interested in the intermediate result of cmd1. To create a new file_cmd1 containing those results, use tee:

$ cmd1 | tee file_cmd1 | cmd2 | cmd3 > file1

The results in file1 will be the same as in the original example, and the intermediate results of cmd1 will be placed in file_cmd1.