Book: LPI Linux Certification in a Nutshell
Section: Chapter 7.  Administrative Tasks (Topic 2.11)



7.5 Objective 5: Maintain an Effective Data Backup Strategy

Regardless of how careful we are or how robust our hardware might be, it is highly likely that sometimes data will be lost. Though fatal system problems are rare, accidentally deleted files or mistakes using mv or cp are common. Routine system backup is essential to avoid losing precious data.

There are many reasons to routinely back up your systems:

  • Protection against disk failures

  • Protection against accidental file deletion and corruption

  • Protection against disasters, such as fire, water, or vandalism

  • Retention of historical data

  • Creation of multiple copies of data, with one or more copies stored at off-site locations for redundancy

All of these reasons for creating a backup strategy could be summarized as insurance. Far too much time and effort goes into a computer system to allow random incidents to force repeated work.

7.5.1 Backup Concepts and Strategies

Most backup strategies involve copying data between at least two locations. At a prescribed time, data is transferred from a source media (such as a hard disk) to some form of backup media. Backup media are usually removable, and include tapes, floppy disks, Zip disks, and so on. These media are relatively inexpensive, compact, and easy to store off-site. On the other hand, they are slow relative to hard disk drives.

7.5.1.1 Backup types

Backups are usually run in one of three general forms:

Full backup

A full, or complete, backup saves all of the files on your system. Depending on circumstances, "all files" may mean all files on the system, all files on a physical disk, all files on a single partition, or all files that cannot be recovered from original installation media. Depending on the size of the drive being backed up, a full backup can take hours to complete.

Differential backup

Save only files that have been modified or created since the last full backup. Compared to full backups, differentials are relatively fast because of the reduced number of files written to the backup media. A typical differential scheme would include full backup media plus the latest differential media. Intermediate differential media are superseded by the latest and can be recycled.

Incremental backup

Save only files that have been modified or created since the last backup, including the last incremental backup. These backups are also relatively fast. A typical incremental backup would include full backup media plus the entire series of subsequent incremental media. All incremental media are required to reconstruct changes to the filesystem since the last full backup.

Typically, a full backup is coupled with a series of either differential backups or incremental backups, but not both. For example, a full backup could be run once per week with six daily differential backups on the remaining days. Using this scheme, a restoration is possible from the full backup media and the most recent differential backup media. Using incremental backups in the same scenario, the full backup media and all incremental backup media would be required to restore the system. The choice between the two is related mainly to the tradeoff between media consumption (incremental backup requires more media) versus backup time (differential backup takes longer, particularly on heavily used systems).

For large organizations that require retention of historical data, a backup scheme longer than a week is created. Incremental or differential backup media are retained for a few weeks, after which the tapes are reformatted and reused. Full backup media are retained for an extended period, perhaps permanently. At the very least, one full backup from each month should be retained for a year or more.

A backup scheme such as this is called a media rotation scheme, because media are continually written, retained for a defined period, and then reused. The media themselves are said to belong to a media pool, which defines the monthly full, the weekly full, and differential or incremental media assignments, as well as when media can be reused. When media with full backups are removed from the pool for long-term storage, new media join the pool, keeping the size of the pool constant. Media may also be removed from the pool if your organization chooses to limit the number of uses media are allowed, assuming that reliability goes down as the number of passes through a tape mechanism increases.

Your organization's data storage requirements dictate the complexity of your backup scheme. On systems in which many people frequently update mission-critical data, a conservative and detailed backup scheme is essential. For casual-use systems, such as desktop PCs, only a basic backup scheme is needed, if at all.

7.5.1.2 Backup verification

To be effective, backup media must be capable of yielding a successful restoration of files. To ensure this, a backup scheme must also include some kind of backup verification in which recently written backup media are tested for successful restore operations. This could take the form of a comparison of files after the backup, an automated restoration of a select group of files on a periodic basis, or even a random audit of media on a recurring basis. However the verification is performed, it must prove that the media, tape drives, and programming will deliver a restored system. Proof that your backups are solid and reliable ensures that they will be useful in case of data loss.

7.5.2 Device Files

Before discussing actual backup procedures, a word on so-called device files is necessary. When performing backup operations to tape and other removable media, you must specify the device using its device file. These files are stored in /dev and are understood by the kernel to stimulate the use of device drivers that control the device. Archiving programs that use the device files need no knowledge of how to make the device work. Here are some typical device files you may find on Linux systems:

/dev/st0

First SCSI tape drive

/dev/ft0

First floppy-controller tape drive, such as Travan drives

/dev/fd0

First floppy disk drive

/dev/hdd

An ATAPI Zip or other removable disk

These names are just examples. The names on your system will be hardware- and distribution-specific.

Did I Rewind That Tape?

When using tape drives, the kernel driver for devices such as /dev/st0 and /dev/ft0 automatically sends a rewind command after any operation. However, there may be times when rewinding the tape is not desirable. Since the archive program has no knowledge of how to send special instructions to the device, a nonrewinding device file exists that instructs the driver to omit the rewind instruction. These files have a leading n added to the filename. For example, the nonrewinding device file for /dev/st0 is /dev/nst0. When using nonrewinding devices, the tape is left at the location just after the last operation by the archive program. This allows the addition of more archives to the same tape.

7.5.3 Using tar and mt

The tar (t ape ar chive) program is used to recursively read files and directories, and then write them onto a tape or into a file. Along with the data goes detailed information on the files and directories copied, including modification times, owners, modes, and so on. This makes tar much better for archiving than simply making a copy does, because the restored data has all of the properties of the original.

The tar utility stores and extracts files from an archive file known as a tarfile, which has the .tar file extension. Since tape drives and other storage devices in Linux are viewed by the system as files, one type of tarfile is a device file, such as /dev/st0 (SCSI tape drive 0). However, nothing prevents using regular files with tar -- this is common practice and a convenient way to distribute complete directory hierarchies as a single file.

During restoration of files from a tape with multiple archives, the need arises to position the tape to the archive that holds the necessary files. To accomplish this control, use the mt command. (The name comes from "m agnetic t ape.") The mt command uses a set of simple instructions that directs the tape drive to perform a particular action.

tar

Syntax

tar [options] files

Description

Archive or restore files. tar recursively creates archives of files and directories, including file properties. It requires at least one basic mode option to specify the operational mode.

Basic mode options

-c

Create a new tarfile.

-t

List the contents of a tarfile.

-x

Extract files from a tarfile.

Frequently used options

-f tarfile

Unless tar is using standard I/O, use the -f option with tar to specify the tarfile. This might be simply a regular file or it may be a device such as /dev/st0.

-v

Verbose mode. By default, tar runs silently. When -v is specified, tar reports each file as it is transferred.

-w

Interactive mode. In this mode, tar asks for confirmation before archiving or restoring files. This option is useful only for small archives.

-z

Enable compression. When using -z, data is filtered through the gzip compression program prior to being written to the tarfile, saving additional space. The savings can be substantial, at times better than an order of magnitude depending on the data being compressed. An archive created using the -z option must also be listed and extracted with -z; tar will not recognize a compressed file as a valid archive without the -z option. Tarfiles created with this option will have the .tar.gz file extension.

-N date

Store only files newer than the date specified. This option can be used to construct an incremental or differential backup scheme.

-V "label"

Adds a label to the .tar archive. Quotes are required to prevent the label from being interpreted as a filename. A label is handy if you find an unmarked tape or poorly named tarfile.

Example 1

Create an archive on SCSI tape of the /etc directory, reporting progress:

# tar cvf /dev/st0 /etc
tar: Removing leading `/' from absolute path names 
in the archive
etc/
etc/hosts
etc/csh.cshrc
etc/exports
etc/group
etc/host.conf
etc/hosts.allow
etc/hosts.deny
etc/motd
...

Note the message indicating that tar will strip the leading slash from /etc for the filenames in the archive. This is done to protect the filesystem from accidental restores to /etc from this archive, which could be disastrous.

Example 2

List the contents of the tar archive on SCSI tape 0:

# tar tf /dev/st0
...

Example 3

Extract the entire contents of the tar archive on SCSI tape 0, reporting progress:

# tar xvf /dev/st0
... 

Example 4

Extract only the /etc/hosts file:

# tar xvf /dev/st0 etc/hosts
etc/hosts

Note that the leading slash is omitted in the file specification (etc/hosts), in order to match the archive with the stripped slash as noted earlier.

Example 5

Create a compressed archive of root's home directory on a floppy:

# tar cvzf /dev/fd0 -V "root home dir" /root
tar: Removing leading `/' from absolute path names 
in the archive
root/
root/lost+found/
root/.Xdefaults
root/.bash_logout
root/.bash_profile
root/.bashrc
root/.cshrc
root/.tcshrc
...
tar (grandchild): Cannot write to /dev/fd0: No space 
left on device
tar (grandchild): Error is not recoverable: exiting now

As you can see from reading the error messages, there isn't enough room on the floppy, despite compression. In this case, try storing the archive to an ATAPI Zip drive:

# tar cvzf /dev/hdd -V "root home dir" /root
...

As mentioned earlier, tape drives have more than one device file. A tape drive's nonrewinding device file allows you to write to the tape without sending a rewind instruction. This allows you to use tar again on the same tape, writing another archive to the media. The number of archives written is limited only by the available space on the tape.

Often multiple archives are written on a single tape to accomplish a backup strategy for multiple computers, multiple disks, or some other situation in which segmenting the backup makes sense. One thing to keep in mind when constructing backups to large media such as tape is the reliability of the media itself. If an error occurs while tar is reading the tape during a restore operation, it may become confused and give up. This may prevent a restore of anything located beyond the bad section of tape. Segmenting the backup into pieces may enable you to position the tape beyond the bad section to the next archive, where tar would work again. In this way, a segmented backup could help shield you from possible media errors.

See the tar info page for full details; info is described in Section 6.1.

mt

Syntax

mt [-h] [-f device_ file] operation [count]

Description

Control a tape drive. The tape drive is instructed to perform the specified operation once, unless count is specified.

Frequently used options

-h

Print usage information, including operation names, and exit.

-f device_ file

Specify the device file; if omitted, the default is used, as defined in the header file /usr/include/sys/mtio.h. The typical default is /dev/tape.

Popular tape operations

fsf [count]

Forward space files. Move forward the number of files specified by count (archives, in the case of tar), leaving the tape positioned at the first block of the next file.

rewind

Rewind to the beginning of the tape.

offline

Eject the tape. This is appropriate for 8 mm or similar drives, where the tape is handled automatically by the mechanism. Ejecting the tape at the end of a backup may prevent an accidental subsequent backup to the same media. This operation is meaningless on devices that cannot eject the tape.

status

Displays status information about the tape drive being used.

tell

For some SCSI tape drives, report the position of the tape in blocks.

Many more operations exist; consult the mt manpage for a complete list of options.

Example 1

Move the tape in /dev/st0 to the third archive on the tape by skipping forward over two archives:

# mt -f /dev/nst0 fsf 2

Note that the nonrewinding device file is specified (/nst0). If the standard device is specified, the tape drive dutifully skips forward to the appropriate location on the tape, then promptly rewinds.

Example 2

Rewind the tape in /dev/st0 :

# mt -f /dev/st0 rewind

Example 3

Eject the tape cartridge:

# mt -f /dev/st0 offline 

Example 4

Determine what device is represented by the default /dev/tape :

# ls -l /dev/tape
lrwxrwxrwx 1 root root 8 Dec 9 15:32 /dev/tape -> /dev/st0

If you wish to use the default tape device /dev/tape and it is not set on your system, you may need to set it manually:

# ln -s /dev/tape /dev/st0

7.5.4 Backup Operations

Using tar or mt interactively for routine system backups can become tedious. It is common practice to create backup scripts called by cron to execute the backups for you. This leaves the administrator or operator with the duty of providing correct media and examining logs. This section describes a basic backup configuration using tar, mt, and cron.

7.5.4.1 What should I back up?

It's impossible to describe exactly what to back up on your system. If you have enough time and media, complete backups of everything are safest. However, much of the data on a Linux system, such as commands, libraries, and manpages, don't change routinely and probably won't need to be saved often. Making a full backup of the entire system makes sense after you have installed and configured your system. Once you've created a backup of your system, there are some directories that you should routinely back up:

/etc

Most of the system configuration files for a Linux system are stored in /etc, which should be backed up regularly.

/home

User files are stored in /home. Depending on your configuration, you may also store web server files in /home/httpd. On multiuser systems or large web servers, /home can be quite large.

/usr/src

If you've done any kernel compilation, back up /usr/src to save your work.

/var/log

If you have security or operational concerns, it may be wise to save log files stored in /var/log.

/var/spool/mail

If you use email hosted locally, the mail files are stored in /var/spool/mail and should be retained.

/var/spool/at and /var/spool/cron

Users' at and crontab files are stored in /var/spool/at and /var/spool/cron, respectively. These directories should be retained if these services are available to your users.

Of course, this list is just a start, as each system will have different backup requirements.

7.5.4.2 A scripted backup with tar, mt, and cron

This section presents a simple yet effective backup methodology. The backups are scheduled to run via cron using a shell script. This example is not intended as a production solution, but rather as an illustration of the general concepts involved in automating a backup scheme.

In Example 7-5, we back up /etc and /home using tar, executing both full and differential backups to two independent segments on a tape. We use a bash script scheduled in cron using root's crontab file. The script will perform full backups once per week early on Monday morning and differential backups on the remaining six mornings of the week. Differential backups will be done using tar's -N option. The line numbers are for reference only and not part of the code.

Example 7-5. A Simple Backup Script
1  #!/bin/bash
2
3  # This script performs a weekly-full/daily-differential tar backup 
4  # to tape. Each item in "targets" is placed in a separate tape 
5  # tarfile. Gzip compression is enabled in tar.
6
7  # what to back up 
8  targets="/etc /home"
9 
10 # the day we want a full backup (others are differential)
11 fullday=Mon
12
13 # the target tape drive and its non-rewinding twin
14 device="/dev/st0"
15 device_n="/dev/n`/bin/basename $device`"
16
17 # get the last full backup date and the present date
18 datefile="/var/tmp/backup_full_date"
19 prev_full=`/bin/cat $datefile`
20 now=`/bin/date`
21
22 # See if today is the full backup day
23 if (`echo $now | grep $fullday > /dev/null`) 
24 then
25     # create and secure the new date file
26     /bin/echo $now > $datefile
27     /bin/chmod 600 $datefile
28
29     # full backup
30     for target in $targets
31     do
32     /bin/tar -cvzf $device_n \
33         -V "Full backup of $target on $now" \
34         $target
35         # let the tape drive flush its buffer
36         sleep 5
37     done
38 else
39     # If today isn't the day to perform the full backup
40     # then the differential backup is performed
41     for target in $targets
42     do
43     /bin/tar -cvzf $device_n \
44         -V "Differential backup of $target from $prev_full to $now" \
45         -N "$now" \
46         $target
47           # let the tape drive flush its buffer
48           sleep 5
49     done
50 fi
51
52 # rewind and eject the tape
53 /bin/mt -f $device rewind
54 sleep 1
55 /bin/mt -f $device offline

Now let's look at some of the key elements of this script:

Lines 7-8

The targets variable contains a space-separated list of directories to back up.

Lines 10-11

fullday contains the day that full backups should run.

Lines 13-15

We define the device and its nonrewinding version.

Lines 17-20

We specify a datefile, which will simply contain the output of the date command at the start time of each full backup. This date is used by tar to determine which files belong in subsequent differential backups.

Lines 22-46

We then check to see if we're on the full backup day and then run tar on each target accordingly, with all output going to the same tape.

Lines 36, 48, and 54

Sometimes a tape drive indicates that it has completed an operation before it is ready for another. By adding some delays to the script, we can be sure that the tape drive is ready.

Lines 53 and 55

Finally, we rewind and unload the tape.

To execute this script daily, the following entry is made in root's crontab:

# run the backup script at 00:05 every day
5 0 * * * /root/backup

On Sunday night, a blank tape is inserted in the drive for the full backup. During the week, other tapes are used to record each differential backup.

If necessary, a few weeks of full backups can be retained for historical purposes. Differential backups are sometimes retained for a short period, perhaps two weeks, to allow the restoration of a file on a particular day. This is a nice policy to implement, as it protects users by allowing them access to intermediate versions of their work.

As stated earlier, this is only a simple backup scheme, and many improvements could be made to it. For example, root will receive all of the output from the tar commands in the script via email, even for successful runs. Since the system administrator may not wish to view all of this good news, the script could be modified to alert the administrator only when an error occurs. The script also does not attempt to read the tape it just created, leaving the administrator to verify backups manually.

7.5.4.3 Locked files and single-user mode

Running the script in Example 7-5 late at night may be sufficient to create a reasonable general backup scheme in many situations. However, if users or overnight processes are actively working in a filesystem as it is backed up, the state of the files in the archive will be in question. To avoid this problem, it may be safest to eliminate the users and processes from the backup scheme completely by putting Linux into single-user mode (runlevel 1) before executing the backup. In this mode, users will not be logged on, and most services, such as web or database servers, will be shut down. With no active processes running, the filesystem can be safely backed up. See Chapter 5 for more information on changing runlevels.

7.5.5 Maintenance, Verification, and Restoration

Verifying the integrity of your backups and performing occasional file restorations and system maintenance are easy processes. As mentioned earlier, backup schemes are useless unless they successfully yield positive results during a restoration.

7.5.5.1 Caring for tape drive mechanisms

Modern tape drives store large volumes of data onto compact and relatively inexpensive media with a surprisingly high degree of reliability. Their reliability is so good that it is easy to forget that the tape drives require routine cleaning.

The surface of magnetic media is coated with one or more layers of microscopic metal oxide particles. As tapes pass over the tape drive mechanism, some of these particles begin to accumulate on the heads of the tape drive. A tape head is a very small and sensitive set of electromagnets that pass over the tape. When oxide particles accumulate on the heads, they become less effective and can fail completely in extreme cases. Some devices are capable of cleaning the heads themselves, but most require periodic insertion of special cleaning media. These media look like ordinary tapes, but they are formulated to extract loose particles from the tape heads. In a production environment with daily tape drive activity, it is common to use cleaning media once every week or two.

It is important to follow the recommendations of the tape drive manufacturer for cleaning media selection and cleaning frequency, and to keep the cleaning procedure a prominent part of a solid backup methodology.

7.5.5.2 Media expiration

Some media manufacturers make claims that their media are "guaranteed for life." But be careful here -- the guarantee is probably good for only the cost of the media, not for the data you've stored on it. The manufacturer's guarantee won't get you very far if you're having difficulty restoring priceless data from an old, overused, worn-out tape. It's imperative that you implement a media rotation scheme to place a limit on the number of uses of any given medium. Adding a usage limit can help to avoid getting into trouble by over-using a tape. There is no hard rule on how many times a tape can be used, and any guidelines should be based on the drive technology, recommendations from drive and tape manufacturers, and direct personal experience. You may find that your situation shows that media can be reused quite often. Regardless, it is best to avoid thinking of media in perpetual rotation. At the very least, replace your backup media once or twice a year, just to be safe.

7.5.5.3 Verifying tar archives

Keeping tape drives clean and using fresh media lay a solid foundation for reliable backups. In addition to those preventive measures, you'll want to routinely verify your backups to ensure that everything ran smoothly. Verification is important on many levels. Clearly, it is important to ensure that the data is correctly recorded. Beyond that, you should also verify that the tape drives and the backup commands function correctly during restoration. Proper file restoration techniques should be established and tested during normal operations, before tragedy strikes and places your operation into an emergency situation.

You can verify the contents of a tar archive by simply listing its contents. For example, suppose a backup has been made of the /etc directory using the following command:

# tar cvzf /dev/st0 /etc

After the backup is complete, the tape drive rewinds. The archive can then be verified immediately by reviewing the contents with the -t option:

# tar tf /dev/st0

This command lists the contents of the archive so that you can verify the contents of the tarfile. Additionally, any errors that may prevent tar from reading the tape is displayed at this time. If there are multiple archives on the tape, they can be verified in sequence using the nonrewinding device file:

# tar tf /dev/nst0
# mt -f /dev/nst0 fsf 1
# tar tf /dev/nst0
# mt -f /dev/st0 rewind

While this verification tells you that the tapes are readable, it does not tell you that the data being read is identical to that in the filesystem. If your backup device supports them, the tar utility contains two options -- verify and compare -- that may be useful to you. However, comparisons of files on the backup media against the live filesystem may yield confusing results if your files are changing constantly. In this situation, it may be necessary to select specific files for comparison that you are certain will not change after they are backed up. You would probably restore those files to a temporary directory and compare them manually, outside of tar. If it is necessary to compare an entire archive, be aware that doing so doubles the time required to complete the combined backup and verify operation.

7.5.5.4 File restoration

Restoring files from a tar archive is simple. However, you must exercise caution regarding exactly where you place the restored files in the filesystem. In some cases, you may be restoring only one or two files, which may be safely written to their original locations if you're sure the versions on tape are the ones you need. However, restoring entire directories to their original locations on a running system can be disastrous, resulting in changes being made to the system without warning as files are overwritten. For this reason, it is common practice to restore files to a different location and move those files you need into the directories where you want them.

Reusing a previous example, suppose a backup has been made of the /etc directory:

# tar cvzf /dev/st0 /etc

To restore the /etc/hosts file from this archive, the following commands can be used:

# cd /tmp
# tar xzf /dev/st0 etc/hosts

The first command puts our restore operation out of harm's way by switching to the /tmp directory. (The directory selected could be anywhere, such as a home directory or scratch partition.) The second command extracts the specified file from the archive. Note that the file to extract is specified without the leading slash. This file specification will match the one originally written to the media by tar, which strips the slash to prevent overwriting the files upon restore. tar will search the archive for the specified file, create the etc directory under /tmp, and then create the final file: /tmp/etc/hosts. This file should then be examined by the system administrator and moved to the appropriate place in the filesystem only after its contents have been verified.

To restore the entire /etc directory, simply specify that directory:

# tar xzf /dev/st0 etc

To restore the .bash_ profile file for user jdean from a second archive on the same tape, use mt before using tar:

# cd /tmp
# mt -f /dev/nst0 fsf 1
# tar xzf /dev/st0 /home/jdean/.bash_profile

In this example, the nonrewinding tape device file is used with mt to skip forward over the first archive. This leaves the tape positioned before the second archive, where it is ready for tar to perform its extraction.

On the Exam

This Objective on system backup isn't specific about particular commands or techniques. However, tar is among the most common methods in use for simple backup schemes.

You should also know how to use the mt command to position a tape to extract the correct archive.