[ Team LiB ] Previous Section Next Section

Recipe 11.16 Program: Outlines

Outlines are a simple (and thus popular) way of structuring data. The hierarchy of detail implied by an outline maps naturally to our top-down way of thinking about the world. The only problem is that it's not obvious how to represent outlined data as a Perl data structure.

Take, for example, this simple outline of some musical genres:

Alternative
.Punk
..Emo
..Folk Punk
.Goth
..Goth Rock
..Glam Goth
Country
.Old Time
.Bluegrass
.Big Hats
Rock
.80s
..Big Hair
..New Wave
.60s
..British
..American

Here we use a period to indicate a subgroup. There are many different formats in which that outline could be output. For example, you might write the genres out in full:

Alternative
Alternative - Punk
Alternative - Punk - Emo
Alternative - Punk - Folk Punk
Alternative - Goth
...

You might number the sections:

1 Alternative
1.1 Punk
1.1.1 Emo
1.1.2 Folk Punk
1.2 Goth
...

or alphabetize:

Alternative
Alternative - Goth
Alternative - Goth - Glam Goth
Alternative - Goth - Goth Rock
Alternative - Punk
Alternative - Punk - Emo
...

or show inheritance:

Alternative
Punk - Alternative
Emo - Punk - Alternative
Folk Punk - Punk - Alternative
Goth - Alternative
Goth Rock - Goth - Alternative
...

These transformations are all much easier than it might seem. The trick is to represent the levels of the hierarchy as elements in an array. For example, you'd represent the third entry in the sample outline as:

@array = ("Alternative", "Goth", "Glam Goth");

Now reformatting the entry is trivial. There's an elegant way to parse the input file to get this array representation:

while (<FH>) {
  chomp;
  $tag[$in = s/\G\.//g] = $_;
  # do something with @tag[0..$in]
}

The substitution deletes leading periods from the current entry, returning how many it deleted. This number indicates the indentation level of the current entry.

Alphabetizing is now simple using the Unix sort program:

$ISA = "-";
open(STDOUT, "|sort -b -t'$ISA' -df");
while (<DATA>) {
    chomp;
    $tag[$in = s/\G\.//g] = $_;
    print join(" $ISA ", @tag[0 .. $in]);
}
close STDOUT;
_ _END_ _
Alternative
.Punk
..Emo
..Folk Punk
.Goth

Numbering the outline is equally simple:

while (<DATA>) {
    chomp;
    $count[$in = s/\G\.//g]++;
    delete @count[($in+1) .. $#count];
    print join(".", @count), " $_";
} 
_ _END_ _
Alternative
.Punk
..Emo
..Folk Punk
.Goth
..Goth Rock

Notice that renumbering is our only application where we've deleted elements from the array. This is because we're not keeping names of hierarchy levels in the array; now we're keeping counts. When we go up a level (e.g., from three levels down to a new second-level heading), we reset the counter on the old level.

    [ Team LiB ] Previous Section Next Section