Recipe 9.8 Searching Filesystems Effectively

9.8.1 Problem

You want to locate files of interest to detect security risks.

9.8.2 Solution

Use find and xargs, but be knowledgeable of their important options and limitations.

9.8.3 Discussion

Are security risks lurking within your filesystems? If so, they can be hard to detect, especially if you must search through mountains of data. Fortunately, Linux provides the powerful tools find and xargs to help with the task. These tools have so many options, however, that their flexibility can make them seem daunting to use. We recommend the following good practices:

Know your filesystems

Linux supports a wide range of filesystem types. To see the ones configured in your kernel, read the file /proc/filesystems. To see which filesystems are currently mounted (and their types), run:

$ mount
/dev/hda1 on / type ext2 (rw)
/dev/hda2 on /mnt/windows type vfat (rw)
remotesys:/export/spool/mail on /var/spool/mail type nfs 
(rw,hard,intr,noac,addr=192.168.10.13)
//MyPC/C$ on /mnt/remote type smbfs (0)
none on /proc type proc (rw)
...

with no options or arguments. We see a traditional Linux ext2 filesystem (/dev/hda1), a Windows FAT32 filesystem (/dev/hda2), a remotely mounted NFS filesystem (remotesys:/export/spool/mail), a Samba filesystem (//MyPC/C$) mounted remotely, and the proc filesystem provided by the kernel. See mount(8) for more details.

Know which filesystems are local and which are remote

Searching network filesystems like NFS partitions can be quite slow. Furthermore, NFS typically maps your local root account to an unprivileged user on the mounted filesystem, so some files or directories might be inaccessible even to root. To avoid these problems when searching a filesystem, run find locally on the server that physically contains it.

Be aware that some filesystem types (e.g., for Microsoft Windows) use different models for owners, groups, and permissions, while other filesystems (notably some for CD-ROMs) do not support these file attributes at all. Consider scanning "foreign" filesystems on servers that recognize them natively, and just skip read-only filesystems like CD-ROMs (assuming you know and trust the source).

The standard Linux filesystem type is ext2. If your local filesystems are of this type only,^[1] you can scan them all with a command like:

^[1] And if they are not mounted on filesystems of other types, which would be an unusual configuration.

# find / ! -fstype ext2 -prune -o ... (other find options) ...

This can be readily extended to multiple local filesystem types (e.g., ext2 and ext3):

# find / ! \( -fstype ext2 -o -fstype ext3 \) -prune -o ...

The find -prune option causes directories to be skipped, so we prune any filesystems that do not match our desired types (ext2 or ext3). The following -o ("or") operator causes the filesystems that survive the pruning to be scanned.

The find -xdev option prevents crossing filesystem boundaries, and can be useful for avoiding uninteresting filesystems that might be mounted. Our recipes use this option as a reminder to be conscious of filesystem types.

Carefully examine permissions

The find -perm option can conveniently select a subset of the permissions, optionally ignoring the rest. In the most common case, we are interested in testing for any of the permissions in the subset: use a "+" prefix with the permission argument to specify this. Occasionally, we want to test all of the permissions: use a "-" prefix instead.^[2] If no prefix is used, then the entire set of permissions is tested; this is rarely useful.

^[2] Of course, if the subset contains only a single permission, then there is no difference between "any" and "all," so either prefix can be used.

Handle filenames safely

If you scan enough filesystems, you will eventually encounter filenames with embedded spaces or unusual characters like newlines, quotation marks, etc. The null character, however, never appears in filenames, and is therefore the only safe separator to use for lists of filenames that are passed between programs.

The find -print0 option produces null-terminated filenames; xargs and perl both support a -0 (zero) option to read them. Useful filters like sort and grep also understand a -z option to use null separators when they read and write data, and grep has a separate -Z option that produces null-terminated filenames (with the -l or -L options). Use these options whenever possible to avoid misinterpreting filenames, which can be disastrous when modifying filesystems as root!

Avoid long command lines

The Linux kernel imposes a 128 KB limit on the combined size of command-line arguments and the environment. This limit can be exceeded by using shell command substitution, e.g.:

$ mycommand `find ...`

Use the xargs program instead to collect filename arguments and run commands repeatedly, without exceeding this limit:

$ find ... -print0 | xargs -0 -r mycommand

The xargs -r option avoids running the command if the output of find is empty, i.e., no filenames were found. This is usually desirable, to prevent errors like:

$ find ... -print0 | xargs -0 rm
rm: too few arguments

It can occasionally be useful to connect multiple xargs invocations in a pipeline, e.g.:

$ find ... -print0 | xargs -0 -r grep -lZ pattern | xargs -0 -r mycommand

The first xargs collects filenames from find and passes them to grep, as command-line arguments. grep then searches the file contents (which find cannot do) for the pattern, and writes another list of filenames to stdout. This list is then used by the second xargs to collect command-line arguments for mycommand.

If you want grep to select filenames (instead of contents), insert it directly into the pipe:

$ find ... -print0 | grep -z pattern | xargs -0 -r mycommand

In most cases, however, find -regex pattern is a more direct way to select filenames using a regular expression.

Note how grep -Z refers to writing filenames, while grep -z refers to reading and writing data.

xargs is typically much faster than find -exec, which runs the command separately for each file and therefore incurs greater start-up costs. However, if you need to run a command that can process only one file at a time, use either find -exec or xargs -n 1:

$ find ... -exec mycommand '{}' \;
$ find ... -print0 | xargs -0 -r -n 1 mycommand

These two forms have a subtle difference, however: a command run by find -exec uses the standard input inherited from find, while a command run by xargs uses the pipe as its standard input (which is not typically useful).

9.8.4 See Also

find(1), xargs(1), mount(8).

[ Team LiB ]