Book HomePerl & XMLSearch this book

10.2. Subclassing

When writing XML-hacking Perl modules, another path to laziness involves standing on (and reading over) the shoulders of giants by subclassing general XML parsers as a quick way to build application-specific modules.

You don't have to use object inheritance; the least complicated way to accomplish this sort of thing involves constructing a parser object in the usual way, sticking it somewhere convenient, and turning around whenever you want to do something XMLy. Here is some bogus code for you:

package XML::MyThingy;

use strict; use warnings;
use XML::SomeSortOfParser;

sub new {
  # Ye Olde Constructor
  my $invocant = shift;
  my $self = {};
  if (ref($invocant)) {
    bless ($self, ref($invocant));
  } else {
    bless ($self, $invocant);
  }

  # Now we make an XML parser...
  my $parser = XML::SomeSortOfParser->new 
      or die "Oh no, I couldn't make an XML parser. How very sad.";

  # ...and stick it on this object, for later reference.
  $self->{xml} = $parser;
  return $self;
}

sub parse_file {

  # We'll just pass on the user's request to our parser object (which
  # just happens to have a method named parse_file)...
  my $self = shift;
  my $result = $self->{xml}->parse_file;

  # What happens now depends on whatever a XML::SomeSortOfParser
  # object does when it parses a file. Let's say it modifies itself and
  # returns a success code, so we'll just keep hold of the now-modified
  # object under this object's 'xml' key, and return the code.
  return $result;
}

Choosing to subclass a parser has some bonuses, though. First, it gives your module the same basic user API as the module in question, including all the methods for parsing, which can be quite lazily useful -- especially if the module you're writing is an XML application helper module. Second, if you're using a tree-based parser, you can steal -- er, I mean embrace and extend -- that parser's data structure representation of the parsed document and then twist it to better serve your own nefarious goal while doing as little extra work as possible. This step is possible through the magic of Perl's class blessing and inheritance functionality.

10.2.1. Subclassing Example: XML::ComicsML

For this example, we're going to set our notional MonkeyML aside in favor of the grim reality of ComicsML, a markup language for describing online comics.[39] It shares a lot of features and philosophies with RSS, providing, among other things, a standard way for comics to share web-syndication information, so a ComicsML helper module might be a boon for any Perl hacker who wishes to write programs that work with syndicated web comics.

[39]See http://comicsml.jmac.org/.

We will go down a DOMmish path for this example and pull XML::LibXML down as our internal mechanism of choice, since it's (mostly) DOM compliant and is a fast parser. Our goal is to create a fully object-oriented API for manipulating ComicsML documents and all the major child elements within them:

use XML::ComicsML;

# parse an existing ComicsML file
my $parser = XML::ComicsML::Parser->new;
my $comic = $parser->parsefile('my_comic.xml');

my $title = $comic->title;
print "The title of this comic is $title\n";

my @strips = $comic->strips;
print "It has ".scalar(@strips)." strips associated with it.\n";

Without further ado, let's start coding.

package XML::ComicsML;

# A helper module for parsing and generating ComicsML documents.

use XML::LibXML;
use base qw(XML::LibXML);

# PARSING

# We catch the output of all XML::LibXML parsing methods in our hot
# little hands, then proceed to rebless selected nodes into our own
# little clasees

sub parse_file {
  # Parse as usual, but then rebless the root element and return it.
  my $self = shift;
  my $doc = $self->SUPER::parse_file(@_);
  my $root = $doc->documentElement;
  return $self->rebless($root);
}

sub parse_string {
  # Parse as usual, but then rebless the root element and return it.
  my $self = shift;
  my $doc = $self->SUPER::parse_string(@_);
  my $root = $doc->documentElement;
  return $self->rebless($root);
}

What exactly are we doing, here? So far, we declared the package to be a child of XML::LibXML (by way of the use base pragma), but then we write our own versions of its three parsing methods. All do the same thing, though: they call XML::LibXML's own method of the same name, capture the root element of the returned document tree object, and then pass it to these internal methods:

sub rebless {

  # Accept  some kind of XML::LibXML::Node (or a subclass
  # thereof) and, based on its name, rebless it into one of
  # our ComicsML classes.
  my $self = shift;
  my ($node) = @_;

  # Define a has of interesting element types. (hash for easier searching.)
  my %interesting_elements = (comic=>1,
                              person=>1,
                              panel=>1,
                              panel-desc=>1,
                              line=>1,
                              strip=>1,
                             );

  # Toss back this node unless it's an Element, and Interesting. Else,
  # carry on.
    my $name = $node->getName;
    return $node unless ( (ref($node) eq 'XML::LibXML::Element') 
        and (exists($interesting_elements{$name})) );
    
    # It is an interesting element! Figure out what class it gets, and rebless it.
    my $class_name = $self->element2class($name);
    bless ($node, $class_name);
  return $node;
}

sub element2class {

  # Munge an XML element name into something resembling a class name.
  my $self = shift;
  my ($class_name) = @_;
  $class_name = ucfirst($class_name);
  $class_name =~ s/-(.?)/uc($1)/e;
  $class_name = "XML::ComicsML::$class_name";
}

The rebless method takes an element node, peeks at its name, and sees if it appears on a hardcoded list it has of "interesting" element names. If it appears on the list, it chooses a class name for it (with the help of that silly element2class method) and reblesses it into that class.

This behavior may seem irrational until you consider the fact that XML::LibXML objects are not very persistent, due to the way they are bound with the low-level, C-based structures underneath the Perly exterior. If I get a list of objects representing some node's children, and then ask for the list again later, I might not get the same Perl objects, though they'll both work (being APIs to the same structures on the C library-produced tree). This lack of persistence prevents us from, say, crawling the whole tree as soon as the document is parsed, blessing the "interesting" elements into our own ComicsML-specific classes, and calling it done.

To get around this behavior, we do a little dirty work, quietly turning the Element objects that XML::LibXML hands us into our own kinds of objects, where applicable. The main advantage of this, beyond the egomaniacal glee of putting our own (class) name on someone else's work, is the fact that these reblessed objects are now subject to having some methods of our own design called on them. Now we can finally define these classes.

First, we will taunt you by way of the AUTOLOAD method that exists in XML::ComicsML::Element, a virtual base class from which our "real" element classes all inherit. This glop of code lords it over all our element classes' basic child-element and attribute accessors; when called due to the invocation of an undefined method (as all AUTOLOAD methods answer to), it first checks to see if the method exists in that class's hardcoded list of legal child elements and attributes (available through the element() and attribute() methods, respectively); failing that, if the method had a name like add_foo or remove_foo, it enters either constructor or destructor mode:

package XML::ComicsML::Element;

# This is an abstract class for all ComicsML Node types.
use base qw(XML::LibXML::Element);
use vars qw($AUTOLOAD @elements @attributes);

sub AUTOLOAD {
  my $self = shift;
  my $name = $AUTOLOAD;
  $name =~ s/^.*::(.*)$/$1/;
  my @elements = $self->elements;
  my @attributes = $self->attributes;
  if (grep (/^$name$/, @elements)) {

    # This is an element accessor.
    if (my $new_value = $_[0]) {
      # Set a value, overwriting that of any current element of this type.
      my $new_node = XML::LibXML::Element->new($name);
      my $new_text = XML::LibXML::Text->new($new_value);
      $new_node->appendChild($new_text);
      my @kids = $new_node->childNodes;
      if (my ($existing_node) = $self->findnodes("./$name")) {
        $self->replaceChild($new_node, $existing_node);
      } else {
        $self->appendChild($new_node);
      }
    }

    # Return the named child's value.
    if (my ($existing_node) = $self->findnodes("./$name")) {
      return $existing_node->firstChild->getData;
    } else {
      return '';
    }

  } elsif (grep (/^$name$/, @attributes)) {
    # This is an attribute accessor.
    if (my $new_value = $_[0]) {
      # Set a value for this attribute.
      $self->setAttribute($name, $new_value);
    }

    # Return the names attribute's value.
    return $self->getAttribute($name) || '';

    # These next two could use some error-checking.
  } elsif ($name =~ /^add_(.*)/) {
    my $class_to_add = XML::ComicsML->element2class($1);
    my $object = $class_to_add->new;
    $self->appendChild($object);
    return $object;

  } elsif ($name =~ /^remove_(.*)/) {
    my ($kid) = @_;
    $self->removeChild($kid);
    return $kid;
  }

}

# Stubs
         
sub elements {
  return ();
}

sub attributes {
  return ();
}

package XML::ComicsML::Comic;
use base qw(XML::ComicsML::Element);

sub elements {
  return qw(version title icon description url);
}

sub new {
  my $class = shift;
  return $class->SUPER::new('comic');
}

sub strips {
  # Return a list of all strip objects that are children of this comic.
  my $self = shift;
  return map {XML::ComicsML->rebless($_)}  $self->findnodes("./strip");
}

sub get_strip {
  # Given an ID, fetch a strip with that 'id' attribute.
  my $self = shift;
  my ($id) = @_;
  unless ($id) {
    warn "get_strip needs a strip id as an argument!";
    return;
  }
  my (@strips) = $self->findnodes("./strip[attribute::id='$id']");
  if (@strips > 1) {
    warn "Uh oh, there is more than one strip with an id of $id.\n";
  }
  return XML::ComicsML->rebless($strips[0]);
}

Many more element classes exist in the real-life version of ComicsML -- ones that deal with people, strips within a comic, panels within a strip, and so on. Later in this chapter, we'll take what we've written here and apply it to an actual problem.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.