3.4 Objective 4: Use Unix Streams,
Pipes,and Redirects
Among the many beauties of the Linux and Unix
systems is the notion that everything
is a file. Things such as disk drives and their
partitions, tape drives, terminals, serial ports, the mouse,
and even audio are mapped into the filesystem. This mapping
allows programs to interact with many different devices and
files in the same way, simplifying their interfaces. Each
device that uses the file metaphor is given a device
file, which is a special object
in the filesystem that provides an interface to the device.
The kernel associates device drivers with various device
files, which is how the system manages the illusion that
devices can be accessed as if they were files. Using a
terminal as an example, a program reading from the terminal's
device file will receive characters typed at the keyboard.
Writing to the terminal causes characters to appear on the
screen. While it may seem odd to think of your terminal as a
file, the concept provides a unifying simplicity to Linux and
Linux programming.
3.4.1 Standard I/O and Default File
Descriptors
Standard I/O is a capability of the shell, used with
all text-based Linux utilities to control and direct program
input, output, and error information. When a program is
launched, it is automatically provided with three file descriptors. File descriptors are regularly used in
programming and serve as a "handle" of sorts to another file.
Standard I/O creates the following file descriptors:
- Standard input (abbreviated
stdin)
-
This file descriptor is a text input
stream. By default it is
attached to your keyboard.
When you type characters into an interactive text program,
you are feeding them to standard input. As you've seen, some programs
take one or more filenames as command-line arguments and
ignore standard input.
Standard input is also known as file
descriptor 0.
- Standard output (abbreviated
stdout)
-
This file descriptor is a text output
stream for normal program output. By default it is attached to your
terminal (or terminal window). Output generated by commands is
written to standard output for display. Standard output is also known as
file descriptor 1.
- Standard error (abbreviated
stderr)
-
This file
descriptor is also a text output stream, but it is used
exclusively for errors or other information unrelated to the
successful results of your command. By default standard error is
attached to your terminal just like standard output. This means that standard output
and standard error are commingled in your display, which can
be confusing. You'll see ways
to handle this later.
Standard error is also known as file
descriptor 2.
Standard output and standard error are
separated because it is often useful to process normal program
output differently than errors.
The standard I/O file descriptors are used in
the same way as those created during program execution to read
and write disk files. They enable you to tie commands together
with files and devices, managing command input and output in
exactly the way you desire. The difference is they are
provided to the program by the shell by default and do not
need to be explicitly created.
3.4.2 Pipes
From a program's
point of view there is no difference between reading text data
from a file and reading it from your keyboard. Similarly,
writing text to a file and writing text to a display are
equivalent operations. As an extension of this idea, it is
also possible to tie the output of one program to the input of
another. This is accomplished using a pipe (|) to join
two or more commands together. For example: $ grep "01523" order* | less
This command searches through all files whose
names begin with order to find
lines containing the word 01523. By creating this pipe,
the standard output of grep is
sent to the standard input of less.
The mechanics of this operation are handled by the shell and
are invisible to the user. Pipes can be used in a series of
many commands. When more than two commands are put together,
the resulting operation is known as a pipeline or
text stream, implying the flow of text from one
command to the next.
As you get used to the idea, you'll find
yourself building pipelines naturally to extract specific
information from text data sources. For example, suppose you
wish to view a sorted list of inode numbers from among the
files in your current directory. There are many ways you could
achieve this. One way would be to use awk in a pipeline to extract the
inode number from the output of ls, then send it on to the sort command and finally to a pager
for viewing:
$ ls -i * | awk '{print $1}' | sort -nu | less
The pipeline concept in particular is a
feature of Linux and Unix that draws on the fact that your
system contains a diverse set of tools for operating on
text. Combining their
capabilities can yield quick and easy ways to extract
otherwise hard to handle information.
3.4.3 Redirection
Each pipe
symbol in the previous pipelines example instructs the shell
to feed output from one command into the input of another. This action is a special form of
redirection, which allows you
to manage the origin of input streams and the destination of
output streams. In the previous example, individual programs
are unaware that their output is being handed off to or from
another program because the shell takes care of the
redirection on their behalf.
Redirection can also occur to and from files.
For example, rather than sending the output of an inode list
to the pager less, it could
easily be sent directly to a file with the > redirection operator: $ ls -i * | awk '{print $1}' | sort -nu > in.txt
By changing the last redirection operator,
the shell creates an empty file (in.txt), opens it for
writing, and the standard output of sort places the results in the file
instead of on the screen. Note that, in this example, anything
sent to standard error is still displayed on the screen.
Since the > redirection operator creates files, the >> redirection operator can be used to
append to existing files.For example, you could use the
following command to append a one-line footnote to
in.txt: $ echo "end of list" >> in.txt
Since in.txt already exists, the quote
will be appended to the bottom of the existing file. If the
file didn't exist, the >>
operator would create the file and insert the text "end of
list" as its contents.
It is important to
note that when creating files, the output redirection
operators are interpreted by the shell before the commands are executed. This means that any output files
created through redirection are opened first. For this reason, you cannot modify
a file in place, like this: $ grep "stuff" file1 > file1 # don't do it!
If file1 contains something of
importance, this command would be a disaster because an empty
file1 would overwrite the original. The grep command would be last to
execute, resulting in a complete data loss from the original
file1 file because the file that replaced it was empty.
To avoid this problem, simply use an intermediate file and
then rename it: $ grep "stuff" file1 > file2
$ mv file2 file1
Standard input can also be redirected. The input
redirection operator is <. Using a source other
than the keyboard for a program's input may seem odd at first,
but since text programs don't care about where their standard
input streams originate, you can easily redirect input. For
example, the following command will send a mail message with
the contents of the file in.txt to user jdean:
$ Mail -s "inode list" jdean < in.txt
Normally, the Mail program prompts the user for
input at the terminal. However
with standard input redirected from the file in.txt, no user
input is needed and the command executes silently. Table
3-4 lists the common standard I/O redirections for the
bash shell, specified in the
LPI Objectives.
|
The redirection syntax may be
significantly different if you use another shell.
| |
Table 3-4. Standard I/O Redirections
for the bash shell
Send stdout to
file. |
$ cmd > file
$ cmd 1> file |
Send stderr to
file. |
$ cmd 2> file |
Send both stdout and
stderr to file. |
$ cmd > file 2>&1 |
Send stdout to file1 and
stderr to file2. |
$ cmd > file1 2> file2 |
Receive stdin from
file. |
$ cmd < file |
Append stdout to
file. |
$ cmd >> file
$ cmd 1>> file |
Append stderr to
file. |
$ cmd 2>> file |
Append both stdout and
stderr to file. |
$ cmd >> file 2>&1 |
Pipe stdout from cmd1 to
cmd2. |
$ cmd1 | cmd2 |
Pipe stdout and
stderrfrom cmd1 to cmd2. |
$ cmd1 2>&1 | cmd2 |
Be prepared to demonstrate the
difference between filenames and command names in
commands using redirection operators. Also, check the
syntax on commands in redirection questions to be sure
about which command or file is a data source and which
is a destination. |
3.4.4 Using the tee Command
Sometimes, you'll want to run a program and
send its output to a file while at the same time viewing the
output on the screen. The tee
utility is helpful in this situation.
Syntaxtee [options] files
Description
Read from standard input and write both to
one or more files and to standard output (analogous to
a tee junction in a pipe).
Option
- -a
-
Append to files rather than
overwriting them.
Example
Suppose you're running a pipeline of commands
cmd1, cmd2, and cmd3: $ cmd1 | cmd2 | cmd3 > file1
This sequence puts the ultimate output of the
pipeline into file1. However, you may also be
interested in the intermediate result of cmd1. To create a new
file_cmd1 containing those results, use tee: $ cmd1 | tee file_cmd1 | cmd2 | cmd3 > file1
The results in file1 will be the same
as in the original example, and the intermediate results of
cmd1 will be placed in
file_cmd1.
|