Recipe 11.16 Program: Outlines
Outlines
are a simple (and thus popular) way of structuring data. The
hierarchy of detail implied by an outline maps naturally to our
top-down way of thinking about the world. The only problem is that
it's not obvious how to represent outlined data as a Perl data
structure.
Take, for example, this simple outline of some musical genres:
Alternative
.Punk
..Emo
..Folk Punk
.Goth
..Goth Rock
..Glam Goth
Country
.Old Time
.Bluegrass
.Big Hats
Rock
.80s
..Big Hair
..New Wave
.60s
..British
..American
Here we use a period to indicate a subgroup. There are many different
formats in which that outline could be output. For example, you might
write the genres out in full:
Alternative
Alternative - Punk
Alternative - Punk - Emo
Alternative - Punk - Folk Punk
Alternative - Goth
...
You might number the sections:
1 Alternative
1.1 Punk
1.1.1 Emo
1.1.2 Folk Punk
1.2 Goth
...
or alphabetize:
Alternative
Alternative - Goth
Alternative - Goth - Glam Goth
Alternative - Goth - Goth Rock
Alternative - Punk
Alternative - Punk - Emo
...
or show inheritance:
Alternative
Punk - Alternative
Emo - Punk - Alternative
Folk Punk - Punk - Alternative
Goth - Alternative
Goth Rock - Goth - Alternative
...
These transformations are all much easier than it might seem. The
trick is to represent the levels of the hierarchy as elements in an
array. For example, you'd represent the third entry in the sample
outline as:
@array = ("Alternative", "Goth", "Glam Goth");
Now reformatting the entry is trivial. There's an elegant way to
parse the input file to get this array representation:
while (<FH>) {
chomp;
$tag[$in = s/\G\.//g] = $_;
# do something with @tag[0..$in]
}
The substitution deletes leading periods from the current entry,
returning how many it deleted. This number indicates the indentation
level of the current entry.
Alphabetizing is now simple using the Unix sort
program:
$ISA = "-";
open(STDOUT, "|sort -b -t'$ISA' -df");
while (<DATA>) {
chomp;
$tag[$in = s/\G\.//g] = $_;
print join(" $ISA ", @tag[0 .. $in]);
}
close STDOUT;
_ _END_ _
Alternative
.Punk
..Emo
..Folk Punk
.Goth
Numbering the outline is equally simple:
while (<DATA>) {
chomp;
$count[$in = s/\G\.//g]++;
delete @count[($in+1) .. $#count];
print join(".", @count), " $_";
}
_ _END_ _
Alternative
.Punk
..Emo
..Folk Punk
.Goth
..Goth Rock
Notice that renumbering is our only application where we've deleted
elements from the array. This is because we're not keeping names of
hierarchy levels in the array; now we're keeping counts. When we go
up a level (e.g., from three levels down to a new second-level
heading), we reset the counter on the old level.
|