7.6 Building Recursively Defined Data
Suppose you wanted to capture
information about a filesystem, including the filenames and directory
names, and their included contents. Represent a directory as a hash,
in which the keys are the names of the entries within the directory,
and values are undef for plain files. A sample
/bin directory might look like:
my $bin_directory = {
"cat" => undef,
"cp" => undef,
"date" => undef,
... and so on ...
};
Similarly, the Skipper's home directory might also
contain a personal bin directory (at something
like ~skipper/bin) that contains personal tools:
my $skipper_bin = {
"navigate" => undef,
"discipline_gilligan" => undef,
"eat" => undef,
};
nothing in either structure tells where the directory is located in
the hierarchy. It just represents the contents of some directory.
Go up one level to the Skipper's home directory,
which is likely to contain a few files along with the personal
bin directory:
my $skipper_home = {
".cshrc" => undef,
"Please_rescue_us.pdf" => undef,
"Things_I_should_have_packed" => undef,
"bin" => $skipper_bin,
};
Ahh, notice that you have
three files, but the fourth entry "bin"
doesn't have undef for a value
but rather the hash reference created earlier for the
Skipper's personal bin directory.
This is how you indicate subdirectories. If the value is
undef, it's a plain file; if
it's a hash reference, you have a subdirectory, with
its own files and subdirectories. Of course, you can have combined
these two initializations:
my $skipper_home = {
".cshrc" => undef,
"Please_rescue_us.pdf" => undef,
"Things_I_should_have_packed" => undef,
"bin" => {
"navigate" => undef,
"discipline_gilligan" => undef,
"eat" => undef,
},
};
Now the hierarchical nature of the data starts to come into play.
Obviously, you don't
want to create and maintain a data structure by changing literals in
the program. You should fetch the data by using a subroutine. Write a
subroutine that for a given pathname returns undef
if the path is a file, or a hash reference of the directory contents
if the path is a directory. The base case of looking at a file is the
easiest, so let's write that:
sub data_for_path {
my $path = shift;
if (-f $path) {
return undef;
}
if (-d $path) {
...
}
warn "$path is neither a file nor a directory\n";
return undef;
}
If the Skipper calls this on .cshrc,
he'll get back an undef value,
indicating that a file was seen.
Now for the directory part.
You need a hash reference to be returned, which you declare as a
named hash inside the subroutine. For each element of the hash, you
call yourself to populate the value of that hash element. It goes
something like this:
sub data_for_path {
my $path = shift;
if (-f $path or -l $path) { # files or symbolic links
return undef;
}
if (-d $path) {
my %directory;
opendir PATH, $path or die "Cannot opendir $path: $!";
my @names = readdir PATH;
closedir PATH;
for my $name (@names) {
next if $name eq "." or $name eq "..";
$directory{$name} = data_for_path("$path/$name");
}
return \%directory;
}
warn "$path is neither a file nor a directory\n";
return undef;
}
For each file within the directory
being examined, the response from the recursive call to
data_for_path is undef. This
populates most elements of the hash. When the reference to the named
hash is returned, the reference becomes a reference to an anonymous
hash because the name immediately goes out of scope. (The data itself
doesn't change, but the number of ways in which you
can access the data changes.)
If there is a subdirectory, the nested subroutine call uses
readdir to extract the contents of that directory
and returns a hash reference, which is inserted into the hash
structure created by the caller.
At first, it may look a bit mystifying, but if you walk through the
code slowly, you'll see it's always
doing the right thing.
Test the results of this subroutine by calling it on
. (the current directory) and seeing the result:
use Data::Dumper;
print Dumper(data_for_path("."));
Obviously, this will be more interesting if your current directory
contains subdirectories.
|