7.6 Building Recursively Defined Data

Suppose you wanted to capture information about a filesystem, including the filenames and directory names, and their included contents. Represent a directory as a hash, in which the keys are the names of the entries within the directory, and values are undef for plain files. A sample /bin directory might look like:

my $bin_directory = {
  "cat" => undef,
  "cp" => undef,
  "date" => undef,
  ... and so on ...
};

Similarly, the Skipper's home directory might also contain a personal bin directory (at something like ~skipper/bin) that contains personal tools:

my $skipper_bin = {
  "navigate" => undef,
  "discipline_gilligan" => undef,
  "eat" => undef,
};

nothing in either structure tells where the directory is located in the hierarchy. It just represents the contents of some directory.

Go up one level to the Skipper's home directory, which is likely to contain a few files along with the personal bin directory:

my $skipper_home = {
  ".cshrc" => undef,
  "Please_rescue_us.pdf" => undef,
  "Things_I_should_have_packed" => undef,
  "bin" => $skipper_bin,
};

Ahh, notice that you have three files, but the fourth entry "bin" doesn't have undef for a value but rather the hash reference created earlier for the Skipper's personal bin directory. This is how you indicate subdirectories. If the value is undef, it's a plain file; if it's a hash reference, you have a subdirectory, with its own files and subdirectories. Of course, you can have combined these two initializations:

my $skipper_home = {
  ".cshrc" => undef,
  "Please_rescue_us.pdf" => undef,
  "Things_I_should_have_packed" => undef,
  "bin" => {
    "navigate" => undef,
    "discipline_gilligan" => undef,
    "eat" => undef,
  },
};

Now the hierarchical nature of the data starts to come into play.

Obviously, you don't want to create and maintain a data structure by changing literals in the program. You should fetch the data by using a subroutine. Write a subroutine that for a given pathname returns undef if the path is a file, or a hash reference of the directory contents if the path is a directory. The base case of looking at a file is the easiest, so let's write that:

sub data_for_path {
  my $path = shift;
  if (-f $path) {
    return undef;
  }
  if (-d $path) {
    ...
  }
  warn "$path is neither a file nor a directory\n";
  return undef;
}

If the Skipper calls this on .cshrc, he'll get back an undef value, indicating that a file was seen.

Now for the directory part. You need a hash reference to be returned, which you declare as a named hash inside the subroutine. For each element of the hash, you call yourself to populate the value of that hash element. It goes something like this:

sub data_for_path {
  my $path = shift;
  if (-f $path or -l $path) {        # files or symbolic links
    return undef;
  }
  if (-d $path) {
    my %directory;
    opendir PATH, $path or die "Cannot opendir $path: $!";
    my @names = readdir PATH;
    closedir PATH;
    for my $name (@names) {
      next if $name eq "." or $name eq "..";
      $directory{$name} = data_for_path("$path/$name");
    }
    return \%directory;
  }
  warn "$path is neither a file nor a directory\n";
  return undef;
}

For each file within the directory being examined, the response from the recursive call to data_for_path is undef. This populates most elements of the hash. When the reference to the named hash is returned, the reference becomes a reference to an anonymous hash because the name immediately goes out of scope. (The data itself doesn't change, but the number of ways in which you can access the data changes.)

If there is a subdirectory, the nested subroutine call uses readdir to extract the contents of that directory and returns a hash reference, which is inserted into the hash structure created by the caller.

At first, it may look a bit mystifying, but if you walk through the code slowly, you'll see it's always doing the right thing.

Test the results of this subroutine by calling it on . (the current directory) and seeing the result:

use Data::Dumper;
print Dumper(data_for_path("."));

Obviously, this will be more interesting if your current directory contains subdirectories.

[ Team LiB ]