Team LiB   Previous Section   Next Section

XML Modules

XML (eXtensible Markup Language) is becoming increasingly important in the Oracle world. The language most associated with XML is Java, but there's plenty of XML-related Perl functionality as well, and we'll explore that in the context of data munging in this section.

Perl's XML facilities are surprisingly powerful. Some would even claim they go beyond Java, with more than 300 CPAN modules, SAX2 support, DOM support, and machine facilities allowing the pipelining of XML and XSLT transformations.

Good Perl XML resources include:

http://www.xml.com/pub/q/perlxml:

Main xml.com portal page for articles on Perl and XML.

http://www.perlxml.net:

One of the central Perl XML portals.

http://www.xmlproj.com/perl-xml-faq.dkb:

Main Perl XML FAQ.

http://xmlxslt.sourceforge.net:

XML::XSLT home page.

http://perl.apache.org:

Main Perl Apache portal, mostly related to mod_perl.

http://xml.sergeant.org:

For the latest razor-sharp detail, go to Matt Sergeant's place.

http://sourceforge.net/projects/expat:

James Clark's expat XML parser C library, as accessed by the venerable XML::Parser module written by Larry Wall and Clark Cooper.

http://www.xmlsoft.org:

Home of the libxml2 XML C library, used by the XML::LibXML parser.

http://www.cpan.org/modules/by-module/XML:

The main CPAN page, for Perl XML projects.

http://sourceforge.net/projects/perl-xml:

The main SourceForge site, for Perl XML projects.

Many different XML modules are also on ActiveState. Most of those covered in this section also have a complementary ActivePerl package:

http://www.activestate.com

We'll concentrate in this chapter on the Unix side of life, because this is where we need the more detailed installation instructions. The actual scripts and XML file outputs should be identical for ActivePerl PPM loads.

General Perl XML Parsers

There are two main XML parsers employed by the majority of Perl XML users:

XML::Parser

This was the first major Perl XML parser, and it relies upon expat. As XML has matured, many supplementary modules have been created for it to deal with DOM and SAX issues. XML::Parser comes automatically with ActiveState, and we'll be installing it shortly on Unix.

XML::LibXML

Created by Matt Sergeant and Christian Glahn, this is Perl's interface to Daniel Velliard's libxml2 XML C library. Unlike XML::Parser, this was written after most of the major XML standards had become settled. At the time of writing, there was no ActiveState binary available for XML::LibXML; however, one is sure to come soon. We'll also demonstrate installing this system on Unix, as we need it for the XML::XMLtoDBMS munge described at the end of this section.

XML::Parser

You can obtain the latest XML::Parser from the following CPAN address:

http://www.cpan.org/authors/id/C/CO/COOPERCL

You may also want to pre-install Gisle Aas's LWP World Wide Web library bundle, libwww-perl, and URI module, to provide XML::Parser's make test step with extra tests. See Chapter 5, for the required LWP installation details.

The expat C program download is also available from:

http://sourceforge.net/projects/expat

Follow these steps:

  1. We start with the expat tarball:

    $ gzip -d expat-1.95.2.tar.gz
    $ tar xvf expat-1.95.2.tar
    $ cd expat-1.95.2
  2. The README file is the best place to go next:

    $ vi README
  3. Useful help for the configuration is available via the following command:

    $ ./configure -help
  4. The default installation directories and files are as follows:

    /usr/local/lib/libexpat
    /usr/local/include/expat.h
    /usr/local/bin/xmlwf

    If you'd like to change these, do the following:

    $ ./configure --prefix=/home/oracle/xml

    This will create:

    /home/oracle/xml/lib/libexpat
    /home/oracle/xml/include/expat.h
    /home/oracle/xml/bin/xmlwf
  5. We were happy with the default:

    $ ./configure
  6. Now we can build and install expat:

    $ make
    $ make install
  7. In a triumphant burst of heroic action glory, we install XML::Parser:

    $ gzip -d XML-Parser.2.30.tar.gz
    $ tar xvf XML-Parser.2.30.tar
    $ cd XML-Parser.2.30
    $ vi README

    With a clean expat install, the following should be straightforward:

    $ perl Makefile.PL
    $ make

    You may get some messages about the absence of LWP and URI — you can safely ignore these if you left them out deliberately.

    $ make test
    ...
    All tests successful.
    Files=13,Tests=113,6 wallclock secs (2.93 cusr + 0.24 csys = 3.17 CPU)
      
    $ make install

    XML::Parser is now well and truly on board. Next up is XML::LibXML.

XML::LibXML

The latest XML::LibXML download is available from CPAN. We also need the XML::SAX module from the same place:

http://www.cpan.org/modules/by-module/XML

XML::LibXML is based on the libxml2 C library, which is available from:

http://www.xmlsoft.org

Follow these steps:

  1. We start with the libxml2 tarball:

    $ gzip -d libxml2-2.4.10.tar.gz
    $ tar xvf libxml2-2.4.10.tar
    $ cd libxml2-2.4.10
    $ vi README INSTALL

    You'll find much fuller documentation online at http://xmlsoft.org.

  2. The actual installation should be very similar to expat as described in the previous section. Configuration help can also be found via the following command:

    $ ./configure --help

    We were happy with the defaults and went for the simplest route:

    $ ./configure
    $ make
    $ make install
  3. Once installation completes, you can run a large test suite, which deviates slightly from our usual Perl pattern by coming after the installation:

    $ make tests
    ...
    Testing catal
    Add and del operations on XML Catalogs

    Some of these regression tests may fail because of a tiny number of platform incompatibilities. If the warnings look acceptable, move on.

  4. We can now come to grips with the actual XML::LibXML Perl module. Before we install this, though (come on, you knew there'd be a catch), we have to install Matt Sergeant's XML::SAX module. This is a straightforward typical Perl installation.

  5. The same goes for XML::LibXML. Just unpack the tarball and install with the usual perl Makefile.PL installation run.

XML::LibXSLT

If you are interested in XSLT (Extensible Stylesheet Language Transformations), the Gemini twin of XML::LibXML is XML::LibXSLT, and now's a good time to install it, because it relies on XML::LibXML. We'll require Daniel Veillard's libxslt C library. For more information try the following:

http://www.w3.org/TR/xslt
http://xmlsoft.org/XSLT

To get hold of XML::LibXSLT and libxslt go here:

http://www.cpan.org/authors/id/M/MS/MSERGEANT
http://xmlsoft.org/XSLT/downloads.html

Follow these steps:

  1. The installation follows the usual pattern. First, libxslt:

    $ gzip -d libxslt-1.0.9.tar.gz
    $ tar xvf libxslt-1.0.9.tar
    $ cd libxslt-1.0.9
    $ ./configure
    $ make
    $ make install
  2. Next, unpack XML::LibXSLT and run through its perl Makefile.PL steps.

Do you need XML::LibXSLT? Not really, but if you're a completist as we are, you'll feel that it's nice to be fully loaded with XML::LibXSLT. The ability to transform data with XSLT enables us to cope with XML files that fail to match our exact requirements. This way, we can feed XML through a transformation operation, as in Figure D-5, to make it fit our munging needs.

Figure D-5. Transforming data with XSLT
figs/pdba_ad05.gif

Let's see what's going on here.

  1. Suppose that we wish to pump out XML-ized news data from our corporate public relations database. This news could include the following data file, horsefeather.xml:

    <?xml version="1.0"?>
    <news>
     <item>
      <title>Hackenbush Speaks Out</title>
      <publication>Horsefeather Gazette</publication>
      <url>http://www.horse.feather</url>
      <date>20021225</date>
      <quote>"That's no Lady.  That's my Wife."</quote>
     </item>
    </news>
  2. For web browsers, we need to transform this presentation slightly with the XSLT code in Example D-8, stored in horsefeather.xsl. Notice, particularly, the date transformation code, which can take a string like 20021225 and turn it into the corresponding 25 December 2002: string.

    XSLT code — horsefeather.xsl
    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html"/>
     <xsl:template match="/">
      <HTML>
      <HEAD>
      <TITLE>Horsefeather News - World Latest</TITLE>
      </HEAD>
      <BODY>
      <xsl:apply-templates select="//item">
      <xsl:sort order="descending" data-type="text" select="date"/>
      </xsl:apply-templates>
      </BODY>
      </HTML>
     </xsl:template>
      
     <xsl:template match="item">
      <P><STRONG>
      <CITE><xsl:value-of select="publication"/></CITE>,
      <xsl:apply-templates select="date"/>
      "<a href="{url}"><xsl:value-of select="title"/></a>"
      </STRONG></P>
      <BLOCKQUOTE>
      <xsl:value-of select="comment"/>
      <EM><xsl:value-of select="quote"/></EM>
      </BLOCKQUOTE>
     </xsl:template>
     
     <xsl:template match="date">
      <xsl:param name="date" select="."/>
      <xsl:variable name="day" select="number(substring($date,7,2))"/>
      <xsl:variable name="month" select="number(substring($date,5,2))"/>
      <xsl:variable name="year" select="number(substring($date,1,4))"/>
    
      <xsl:if test="$day>0">
       <xsl:value-of select="$day" />
       <xsl:text> </xsl:text>
      </xsl:if>
    
      <xsl:choose>
       <xsl:when test="$month= 1">January</xsl:when>
       <xsl:when test="$month= 2">February</xsl:when>
       <xsl:when test="$month= 3">March</xsl:when>
       <xsl:when test="$month= 4">April</xsl:when>
       <xsl:when test="$month= 5">May</xsl:when>
       <xsl:when test="$month= 6">June</xsl:when>
       <xsl:when test="$month= 7">July</xsl:when>
       <xsl:when test="$month= 8">August</xsl:when>
       <xsl:when test="$month= 9">September</xsl:when>
       <xsl:when test="$month=10">October</xsl:when>
       <xsl:when test="$month=11">November</xsl:when>
       <xsl:when test="$month=12">December</xsl:when>
      </xsl:choose>
      <xsl:if test="$year>0">
       <xsl:text> </xsl:text>
       <xsl:value-of select="$year" />
       <xsl:text>: </xsl:text>
      </xsl:if>
     </xsl:template>
    </xsl:stylesheet>
  3. We then need the Perl code in Example D-9 to transform our original XML into viewable HTML.

    Transforming news output — xmlLibXSLT.pl
    #!perl -w
      
    use strict;
      
    use XML::LibXSLT;
    use XML::LibXML;
      
    my $parser = XML::LibXML->new(  );
    my $xslt = XML::LibXSLT->new(  );
      
    my $source = $parser->parse_file('horsefeather.xml');
    my $style_doc = $parser->parse_file('horsefeather.xsl');
      
    my $stylesheet = $xslt->parse_stylesheet($style_doc);
      
    my $results = $stylesheet->transform($source);
      
    print $stylesheet->output_string($results);
  4. We simulate the online running of xmlLibXSLT.pl with this command:

    $ perl xmlLibXSLT.pl > horsefeather.html
  5. The resultant horsefeather.html file now pops out of the transformation:

    <HTML>
    <HEAD>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
    <TITLE>Horsefeather News - World Latest</TITLE>
    </HEAD>
    <BODY>
    <P><STRONG><CITE>Horsefeather Gazette</CITE>,
      25 December 2002: 
      &quot;
      <a href="http://www.horse.feather">Hackenbush Speaks Out</a>&quot;
      </STRONG></P>
    <BLOCKQUOTE>
    <EM>&quot;That's no Lady.  That's my Wife.&quot;</EM>
    </BLOCKQUOTE>
    </BODY>
    </HTML>

    This can be viewed in Figure D-6.

    Figure D-6. XSLT transformation from XML to HTML
    figs/pdba_ad06.gif

XML Database Facilities

Now that we have our parsers loaded, we can start racking up our XML weapon toolset prior to battle. In the following sections we'll look at the following Perl XML modules:

XML::Generator::DBI
XML::XPath
XML::XMLtoDBMS

In addition to these, there are many other Perl XML modules available from CPAN. If you have XML needs that these modules don't satisfy, just visit cpan.org, and type in "XML" via the search page. Prepare to be bombarded by 1001 different XML responses. The same goes for the XML-based PPM packages on ActiveState.com. It seems that all the world has been writing XML modules for Perl. Enjoy!

XML::Generator::DBI

XML::Generator::DBI, written by Matt Sergeant, transforms database calls to XML SAX events. It is useful for quickly generating XML files directly from SQL statements. It's also the replacement for the earlier DBIx::XML_RDB, another module from the prolific Mr. Sergeant. Check it out at:

http://www.cpan.org/authors/id/M/MS/MSERGEANT

For testing purposes, and general usage, we also require the services of Michael Koehne's XML::Handler::YAWriter (Yet Another Writer, for Perl SAX):

http://www.cpan.org/authors/id/K/KR/KRAEHE

This, in turn, requires the talents of Ken MacLeod's XML::Parser::PerlSAX, which comes as part of his libxml-perl package. It consists of a general cornucopia of productivity tools, designed originally for use with XML::Parser::

http://www.cpan.org/authors/id/KMACLEOD

Then it's time for that ol' Potomac two-step, with bundle unpacking and Makefile.PL:

  1. First, install libxml-perl.

  2. Now install XML::Handler::YAWriter:

  3. If up until now you've avoided installing MIME::Base64, as described in Chapter 5, you'll need to do it here. This is another prerequisite module:

    http://www.cpan.org/authors/id/GAAS

  4. Paratroopers having established a beachhead, we send in the heavy armor:

    $ gzip -d XML-Generator-DBI-0.01.tar.gz
    $ tar xvf XML-Generator-DBI-0.01.tar
    $ cd XML-Generator-DBI-0.01
  5. Before building and testing XML::Generator::DBI, we need to edit the PWD file, which comes with the tarball. The PWD information is needed to verify make test investigations. Our PWD file looked like this:

    # user name
    UID=scott
    # password
    PWD=tiger
    # Driver to use (as in dbi:Driver)
    DRIVER=Oracle
    # Extra stuff (as in dbi:Driver:extra_stuff)
    EXTRA=ORCL.WORLD
    # Query to use - Get Groucho! :-)
    QUERY=SELECT ename, hiredate FROM EMP WHERE empno = 1001
  6. Once PWD is ready, restart the dance band:

    $ perl Makefile.PL
    $ make
  7. The make test step uses the PWD information to generate quite a bit of test information. We're expecting to see hiredate data on Groucho from the test rows we originally loaded into EMP:

    $ make test
    ...
     <select query="SELECT ename, hiredate FROM EMP WHERE empno = 1001">
      <row>
       <ENAME>Groucho</ENAME>
       <HIREDATE>01-JAN-01</HIREDATE>
      </row>
     </select>
    ...
    t/01yawriter....ok
    All tests successful.
    Files=2, Tests=7, 1 wallclock secs (0.49 cusr + 0.03 csys = 0.52 CPU)
  8. Once the tests look good, install:

    $ make install

We're now ready to run our XML script to produce ducksoup.xml in Example D-10.

First attempt linking XML to DBI — xmlGenDBI.pl
#!perl -w
  
use XML::Generator::DBI;
use XML::Handler::YAWriter;
use DBI;
  
my $writer = XML::Handler::YAWriter->new(AsFile => "ducksoup.xml");
  
my $dbh = DBI->connect("dbi:Oracle:ORCL.WORLD", "scott", "tiger");
  
my $xml_generator = XML::Generator::DBI->new( Handler => $writer,
                                              dbh => $dbh );
  
$xml_generator->execute('select * from emp where empno < 2000');
$dbh->disconnect;

What Example D-10 should do, in a mere handful of lines, is to take a SELECT statement, and turn it into an XML file. You might recall the data we added to the EMP table earlier in this chapter:

SQL> select * from EMP where empno < 2000;
     EMPNO ENAME      JOB          MGR HIREDATE     SAL    COMM DEPTNO
---------- ---------- --------- ------ --------- ------ ------- ------
1001 Groucho    Professor      1 01-JAN-01    100      10     10
1002 Chico      Minister       2 02-JAN-01    200      20     20
1003 Harpo      Stowaway       3 03-JAN-01    300      30     30

Let's run the script and see what happens:

$ perl xmlGenDBI.pl

We've taken just the first <row> output from our generated ducksoup.xml file:

<?xml version="1.0" encoding="UTF-8"?><database>
 <select query="select * from emp where empno &lt; 2000">
  <row>
   <EMPNO>1001</EMPNO>
   <ENAME>Groucho</ENAME>
   <JOB>Professor</JOB>
   <MGR>1</MGR>
   <HIREDATE>01-JAN-01</HIREDATE>
   <SAL>100</SAL>
   <COMM>10</COMM>
   <DEPTNO>10</DEPTNO>
  </row>
  ...
 </select>
</database>

We can now munge data out of an Oracle database, into XML format, but what about going the other way? This is where XML::XPath comes in.

XML::XPath

The XML::Xpath module follows all of the XPath standards you may have seen with other XML toolsets. This XML package is also from:

http://www.cpan.org/modules/by-module/XML

You can learn more about XPath at:

http://www.w3.org/TR/xpath

We're going to use XML::XPath to read XML file data and then pump it into the database to reverse the munge direction from XML::Generator::DBI. There are other ways of doing this — with XLST transformations for example — but we'll use XML::XPath because of its flexibility, its appropriateness for munge-style operations, and its simplicity.

Follow these steps:

  1. XML::XPath requires XML::Parser, as installed earlier, but nothing else:

    $ gzip -d XML-XPath-1.12.tar.gz
    $ tar xvf XML-XPath-1.12.tar
    $ cd XML-XPath-1.12
    $ perl Makefile.PL
    $ make
  2. You'll get lots of output, from 187 tests (we counted 'em) in version 1.12:

    $ make test
    $ make install

You may recall that there were two other Marx brothers in addition to the main three: Zeppo, who appeared in most of the earlier films, and Gummo, who quit the act while it was still on Broadway. However, we do rather coincidentally have their information stored in an XML file, in Example D-11, nightopera.xml. We'll show how we feed XML into Oracle here.

Feeding XML into Oracle — nightopera.xml
<?xml version="1.0" encoding="UTF-8"?>
<database>
 <select>
  <row>
   <EMPNO>1004</EMPNO>
   <ENAME>Zeppo</ENAME>
   <JOB>President</JOB>
   <MGR>1</MGR>
   <HIREDATE>04-JAN-01</HIREDATE>
   <SAL>400</SAL>
   <COMM>40</COMM>
   <DEPTNO>20</DEPTNO>
  </row>
  <row>
   <EMPNO>1005</EMPNO>
   <ENAME>Gummo</ENAME>
   <JOB>Tenor</JOB>
   <MGR>1</MGR>
   <HIREDATE>05-JAN-01</HIREDATE>
   <SAL>500</SAL>
   <COMM>50</COMM>
   <DEPTNO>10</DEPTNO>
  </row>
 </select>
</database>

We uploaded it to EMP with the XML::XPath script in Example D-12. We've worked through this script immediately following the example:

Feeding XML into Oracle — dbiXPATH.pl
#!perl -w
use strict;
use DBI;
use XML::XPath;
# Step 1: Connect up to the sink database.
my $dbh = DBI->connect('dbi:Oracle:ORCL.WORLD', 'scott', 'tiger') || 
             die $DBI::errstr;
  
# Step 2: Locate the source XML data.
  
my $xpath = XML::XPath->new(filename => 'nightopera.xml');
  
# Step 3: Prepare the insertion DML.
my $insert_dml =   qq{ INSERT
         INTO emp ( empno,    ename,
                    job,      mgr,
                    hiredate, sal,
                    comm,     deptno )
           VALUES ( to_number( ? ),            ?,
                    ?,                         to_number ( ? ),
                    to_date( ? , 'DD-MON-YY'), to_number ( ? ),
                    to_number ( ? ),           to_number ( ? ) ) };
  
my $sth = $dbh->prepare( $insert_dml );
  
# Step 4: Extract the XML records one by one, through the loop,
# and insert into database.
    my $mgr = $row->find('MGR')->string_value;
    my $hiredate = $row->find('HIREDATE')->string_value;
    my $sal = $row->find('SAL')->string_value;
    my $comm = $row->find('COMM')->string_value;
    my $deptno = $row->find('DEPTNO')->string_value;
    # Line inserted into the sink.
  
    $sth->execute( $empno,
                   $ename,
                   $job,
                   $mgr,
                   $hiredate,
                   $sal,
                   $comm,
                   $deptno ) || die $DBI::errstr;
}
  
# Step 5: Clean up, and disconnect.
$dbh->disconnect;

Let's see what's going on here.

  1. There are two parallel pathways moving through this script. The first is DBI-based, and our first step with it is to open up a database connection.

  2. In our second path, we locate the XML source input file, nightopera.xml.

  3. Getting back to the first track, we prepare an INSERT statement.

  4. The XML process then uses the findnodes( ) and find( ) methods, to whizz through nightopera.xml, sucking out the relevant information before plugging it straight down into the data sink.

  5. Once through the loop, we disconnect and clean up (as per union rules).

We can see the results here, from the SCOTT.EMP table.

SQL> select * from emp where empno in (1004, 1005);
  
    EMPNO ENAME      JOB          MGR HIREDATE     SAL     COMM DEPTNO
--------- ---------- --------- ------ --------- ------ -------- ------
     1004 Zeppo      President      1 04-JAN-32    400       40     20
     1005 Gummo      Tenor          1 05-JAN-33    500       50     10

A combination of XML::Generator::DBI and XML::XPath may be all you require to carry out whatever munging operations you require, both to and from the Oracle database, especially if you want access to all the other Perl modules at the same time. However, as with all things in Perl, there is another way to do it.

XML::XMLtoDBMS

Let's suppose that we want to extract data from our database into an XML file. We'd like to then beam this across the galaxy to Betelgeuse, via the local StarGate at Vega, and load it there into a Betelgeusian database. We'd like to do all this with a single Perl module. Step forward XML::XMLtoDBMS, a module specially blended with DBI to provide an all-purpose alternative. This module springs directly from its XML-DBMS middleware parent project, which also provides a Java-based alternative. Assuming that we've loaded every XML module discussed so far, except the optional XML::LibXSLT, we have everything we need except for one last module, Graham Barr's TimeDate bundle:

http://www.cpan.org/authors/id/GBARR/

Once installed, we also have Date::Format and Date::Parse on board; these came with TimeDate. We're now ready for the green light.

The parent project, XML-DBMS, is Ronald Bourret's Java-based middleware for transferring data between XML documents and relational databases. It deals with many of the coding inconveniences in-between and is ideal for data munging purposes. Check it out at:

http://www.rpbourret.com/xmldbms/index.htm

From here you'll be directed to the Perl download of XML::XMLtoDBMS, by Nick Semenov. This is ported directly from Ronald Bourret's XML-DBMS software, which is itself written in Java. Before we install XML::XMLtoDBMS, however, let's run through a quick checklist of everything we need:

expat
XML::Parser
XML::Parser::PerlSAX (via libxml-perl )
libxml2
XML::SAX
XML::LibXML
Date::Format (via TimeDate)
Date::Parse (via TimeDate)

Once you have these modules and the XML::XMLtoDBMS tarball, it's time for that new dance, the Chesapeake bay whirl:

$ gzip -d perl-xml-dbms-1.03.tgz
$ tar xvf perl-xml-dbms-1.03.tar
$ cd XML-DBMS
$ vi README

We really ought to read the README file this time, as XML::XMLtoDBMS can be challenging to understand. But once we've got our head round it, the actual installation is straightforward. However, we do need one small adjustment.

To cope with the standard Oracle date format, DD-MON-YY, we're going to introduce a one-line adjustment to the XMLtoDBMS.pm module for the 1.03 version, which is an open source product under constant development. (This may very well have been amended in later versions.) Add the marked line to the convertFormat subroutine. We need this line because it will be difficult to re-insert XML date information back into Oracle, via XML::XMLtoDBMS, if the data is not in DD-MON-YY format:

sub convertFormat
{
        my $formatString = shift;
  
        $formatString =~ s/YYYY/%Y/g;
        $formatString =~ s/YY/%y/g;
        $formatString =~ s/MM/%m/g;
        $formatString =~ s/MON/%b/g; # Typical Oracle month format.
        $formatString =~ s/DD/%d/g;
        $formatString =~ s/hh/%H/g;
        $formatString =~ s/mm/%M/g;
        $formatString =~ s/ss/%S/g;
        return $formatString;
}

A small aside, for those who may think that the fact that this module expects a different date format points out a conceptual weakness of open source software. We disagree. This, we believe, is its greatest strength. If you find that something fails to work exactly the way you expect, you can fix the source code directly, to make it do what you want.

Once we've made this small adjustment, installation is routine:

$ perl Makefile.PL
$ make
$ make test
$ make install

Take a look at the eventual results in Figure D-7.

Figure D-7. One Perl XML module to munge them all
figs/pdba_ad07.gif

The target data-sink table, on Betelgeuse, was created many centuries ago with the following statement:

SQL> create table FordPrefectus
  2  ( TimeLord   number(4) not null,
  3    Role       varchar2(10),
  4    Mission    varchar2(9),
  5    Master     number(4),
  6    Origin     date,
  7    Altairian$ number(7,2),
  8    Credits    number(7,2),
  9    Quadrant   number(2) );
  
Table created.

OK, it is rather remarkable that they have Oracle-type databases on Betelgeuse, but Professor Hawking tells us it's something to do with the infinite pathway effect of those quantum-type particles which make black holes evaporate. The eagle-eyed among you may have also noted that the FORDPREFECTUS table on Betelgeuse is remarkably similar to SCOTT's EMP table back here on Earth. This too is mere quantum coincidence.

Even more fortuitously, the most popular entertainment stars on Betelgeuse are the Marx Brothers, as televised transmissions of their films have only started reaching the Betelgeusian quadrant in the last five years. They would therefore like more information on these black-and-white magicians of the silver screen. As a consequence, we recently received an XML-encoded 3-D message cube, asking us to send the requisite details.

Source mapping

We have agreed upon an XML mapping with the Betelgeusians to facilitate the requested information transfer. At our end, we need a mapping file to construct the XML output. We'll work through this following Example D-13.

The primary benefit for using XML-DBMS over other XML tools such as XML::Generator::DBI and XML::XPath, is that it can treat XML data as arbitrarily nested groups of tables. Other tools with less overhead tend to treat everything as a single table, which can provide bottlenecks with some of the necessarily complex data sets you may encounter. In the extended example below, we've covered only a very small segment of what is possible via the mapping facilities within XML-DBMS. See Ronald Bourret's web site (listed earlier) for much more information on how XML-DBMS can help to solve your own particular XML needs when other Perl XML modules are insufficient.

Source mapping for XML::XMLtoDBMS — emp.map
<?xml version="1.0" ?>
<XMLToDBMS Version="1.0">
   <Options>
      <DateTimeFormats><Patterns Date="DD-MON-YY"/>
      </DateTimeFormats>
   </Options>
   <Maps>
      <IgnoreRoot>
         <ElementType Name="employees"/>
         <PseudoRoot>
            <ElementType Name="emp"/>
            <CandidateKey Generate="No"><Column Name="empno"/>
            </CandidateKey>
         </PseudoRoot>
      </IgnoreRoot>
         <ClassMap>
         <ElementType Name="emp"/>
         <ToClassTable><Table Name="emp"/>
         </ToClassTable>         
         <PropertyMap>
            <ElementType Name="empno"/>
            <ToColumn><Column Name="empno"/>
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="ename"/>     
            <ToColumn><Column Name="ename"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="job"/>     
            <ToColumn><Column Name="job"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="mgr"/>     
            <ToColumn><Column Name="mgr"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="hiredate"/>     
            <ToColumn><Column Name="hiredate"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="sal"/>     
            <ToColumn><Column Name="sal"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="comm"/>     
            <ToColumn><Column Name="comm"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="deptno"/>     
            <ToColumn><Column Name="deptno"/>   
            </ToColumn>
         </PropertyMap>
      </ClassMap>
   </Maps>
</XMLToDBMS>

Let's see what's happening here:

  1. There are four particularly interesting nodes within the emp.map XML file. The first is the <Options> node, which details the date format we're going to use, DD-MON-YY, which is possible after our earlier code adjustment.

  2. The second is the <IgnoreRoot> node, containing two important elements:

    1. The first is the initial <ElementType>, which sets the conceptual name for the whole XML-ised blob of information as employees.

    2. The second, is the <CandidateKey> node, which tells us that the primary key for emp is the <Column> value, empno.

  3. The next important node grouping is the one marked by <ToClassTable>, which contains the <Table> node, confirming to us that our table name is indeed emp. This may seem to be duplication, but it's important, as we'll see later. Also notice the <ClassMap> and <ElementType> mapping just above. The point here is that we've chosen to map some elements to tables, with <ClassMap>, and some other elements to columns, with <PropertyMap>.

  4. The next node group, under <PropertyMap>, is significant. It gives us a conceptual name for each column, under <ElementType>, but supplies us with an actual column name, under the <Column> node.

There are far more complex things possible with the mappings available in XML::XMLtoDBMS — for example, table pairs with different primary and foreign key constraints, varying numbers of columns, date and time format differences, and much more. You name it, it's probably in the mapping language. As you can see from Example D-13, however, even a relatively simple exchange of data can generate a large mapping file. But once you've got the basic structure, the rest is just plugging in the numbers.

You might wish to examine the DTD file, xmldbms.dtd, which comes with the download, as well as visit the web pages:

Source output

Now that we have the mapping, we can generate the XML in Example D-14.

Creating our source data — outXMLDBMS.pl
#!perl -w
  
use strict;
use DBI;
use XML::XMLtoDBMS;
  
# Step 1: Connect to Oracle, as usual.
# Then use the database handle to feed XML::XMLtoDBMS.
  
my $dbh = DBI->connect('dbi:Oracle:ORCL.WORLD', 'scott', 'tiger') || 
             die $DBI::errstr;
  
my $xmlToDbms = new XML::XMLtoDBMS($dbh);
  
$xmlToDbms->setMap('emp.map');
  
# Step 2:  Get hold of the data.  Use the primary keys of our
# required rows, to isolate them.
  
my $xmlOut = $xmlToDbms->retrieveDocument(
                  'emp', 
                  [['1001'],['1002'],['1003'],['1004'],['1005']] );
# Step 3: Output the data to XML.
  
open (XML, ">emp.xml");
# Prettify printing with format 1 line break
print XML $xmlOut->toString(1); 
close XML;
# Step 4: It's important to clean up the acquired DOM memory,
# as well as disconnecting from the Oracle server.
  
$xmlToDbms->destroy;
$dbh->disconnect;

Let's work through the code.

  1. We acquire a database connection via DBI. Once this is done, we can forget about DBI entirely, as XML::XMLtoDBMS takes the database handle and does all the work, based on the emp.map instructions created earlier.

  2. Once we have the appropriate handles in place, we retrieve the required data from the SCOTT.EMP table using retrieveDocument( ). Notice the use of the EMPNO primary keys to get the five rows required. There are several other filter techniques also available.

  3. The next step is simple. We produce the emp.xml file we'll be sending through the Vegan StarGate.

  4. Once we're done, we clean up, both destroying the memory used to create the XML file and disconnecting from the database.

Let's run the script:

$ perl outXMLDBMS.pl

A snippet of the resultant output file, emp.xml, is displayed here, with one of the <emp> records:

<?xml version="1.0" encoding="UTF-8"?>
<employees>
  <emp>
    <mgr>1</mgr>
    <sal>100</sal>
    <ename>Groucho</ename>
    <job>Professor</job>
    <empno>1001</empno>
    <deptno>10</deptno>
    <comm>10</comm>
    <hiredate>01-Jan-01</hiredate>
  </emp>
  ...
</employees>

Notice that we have Groucho embedded, like a nugget of gold, within the XML.

Sink mapping

Once emp.xml is beamed across to Betelgeuse by hyperwave-relay, we're going to need another mapping file. This one will cope with the different column names in FORDPREFECTUS. We've detailed this mapping in Example D-15.

Mapping the data into the sink — timelord.map
<?xml version="1.0" ?><XMLToDBMS Version="1.0">
   <Options>
      <DateTimeFormats><Patterns Date="DD-MON-YY"/>
      </DateTimeFormats>
   </Options>
            </CandidateKey>
         </PseudoRoot>
      </IgnoreRoot>
         <ClassMap>
         <ElementType Name="emp"/>
         <ToClassTable><Table Name="fordprefectus"/>
         </ToClassTable>         
         <PropertyMap>
            <ElementType Name="empno"/>
            <ToColumn><Column Name="timelord"/>
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="ename"/>     
            <ToColumn><Column Name="role"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="job"/>     
            <ToColumn><Column Name="mission"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="mgr"/>     
            <ToColumn><Column Name="master"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="hiredate"/>     
            <ToColumn><Column Name="origin"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="sal"/>     
            <ToColumn><Column Name="altairian$"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="comm"/>     
            <ToColumn><Column Name="credits"/>   
            </ToColumn>
         </PropertyMap>
         <PropertyMap>  
            <ElementType Name="deptno"/>     
            <ToColumn><Column Name="quadrant"/>   
            </ToColumn>
         </PropertyMap>
      </ClassMap>
   </Maps>
</XMLToDBMS>

Let's see what's going on in this example.

  1. Notice the date format in the <Options> node near the start of the file.

  2. Notice also that the <CandidateKey> column name for the primary key is timelord.

  3. The <ToClassTable> mapping also varies slightly. We have all the emp data mapped to the table fordprefectus.

  4. Within each <PropertyMap> node, we also have each <ElementType>, such as empno, being mapped across to a new <Column> name value, such as timelord. This pattern repeats throughout the mapping file.

Sink input

Meanwhile on Betelgeuse, the XML from Earth has arrived and our friends have the requisite mapping file. All that needs to be done, in Example D-16, is to run a universal Perloid script, operating on the galactic standard Traalix operating system and load it up into the database. Let's go.

Inputting into the sink — inXMLDBMS.pl
#!perl -w
  
use strict;
use DBI;
use XML::XMLtoDBMS;
  
# Step 1: Connect to our remote Oracle database on Betelgeuse :-)
# Use the connection acquired to create our XML::XMLtoDBMS object.
  
my $dbh = DBI->connect('dbi:Oracle:BETELGEUSE.WORLD', 
                       'zaphod', 'b33b13br0x') || die $DBI::errstr;
  
my $xmlToDbms = new XML::XMLtoDBMS($dbh);
  
$xmlToDbms->setMap('timelord.map');
  
# Step 2:  Acquire the XML file, and then store it in the datasink.
  
my $xmlIn = 
   $xmlToDbms->storeDocument( Source => {File => "emp.xml"} );
  
# Step 3: Disconnect and clean up memory.
  
$xmlToDbms->destroy;
$dbh->disconnect;

Here's what's happening:

  1. We open the database connection and use it to prime XML::XMLtoDBMS before setting the configuration via timelord.map.

  2. Using storeDocument( ), we pump the emp.xml file into the data sink.

  3. Finally, we wrap up the memory, disconnect, and exit.

Results

The table data folds neatly into FORDPREFECTUS. Mission accomplished:

SQL> select * from FordPrefectus ;
  
TIMELORD ROLE    MISSION   MASTER ORIGIN    ALTAIRIAN$ CREDITS QUADRANT
-------- ------- --------- ------ --------- ---------- ------- --------
    1001 Groucho Professor      1 01-JAN-01        100      10       10
    1002 Chico   Minister       2 02-JAN-01        200      20       20
    1003 Harpo   Stowaway       3 03-JAN-01        300      30       30
    1004 Zeppo   President      1 04-JAN-01        400      40       20
    1005 Gummo   Tenor          1 05-JAN-01        500      50       10

What's Coming in XML::DBMS?

Features coming in Version 2.0 of XML-DBMS include the following:

  • Updates and deletes, as well as further selection filters

  • Heterogeneous joins

  • Additional mapping language features, such as per-column formatting and limited transformation

  • The generation of map files from the database

  • Support for database-generated keys

    Team LiB   Previous Section   Next Section