Appendix D. The Essential Guide to Perl Data Munging
Oracle DBAs
spend a great deal of time handling data that for one reason or
another needs to be cleaned, transformed, and/or formatted. They need
to fill Oracle data warehouses with customer data from multiple
sources, import data into Oracle databases from non-Oracle data
streams, and convert and format source material of all kinds. Whether
it's an XML stream from a web page, a SQL*Loader
feed from a telecom switch, or a snapshot transfer from another
database, DBAs must ensure that these data transfers are clean,
accurate, and timely. Unfortunately, the raw data
they're given to work with is often dirty,
inaccurate, behind schedule, and unfit for SQL*Loader. This is a job
for Perl and its wonderful world of data munging!
Data munging, the process of transforming data
as it is transferred from one place to another, is a topic that is
increasingly important for Oracle DBAs to understand. It is also an
operation that Perl is particularly good at. Perl
DBI's innate ability to deal with multiple database
types simultaneously also makes the transfer of data from one
database to another as simple as lining up dominoes!
This appendix presents the basics of data munging and illustrates a
typical data-munging operation — importing a MySQL data stream
into an Oracle database, transforming it as necessary.
We'll also describe the many Perl data-munging
modules that you can download from CPAN and use in conjunction with
Oracle databases. We'll examine these modules in
several major
categories:
- Numeric modules
-
The modules in this category deal with numeric data and handle
mathematical operations used in data munging. We'll
pay special attention to the very useful
Number::Format module.
- Date modules
-
The modules in this category deal with the special requirements of
dates and their formatting and conversion. Because speed is often
important in data munging, we'll focus on the very
efficient C-based Date::Calc module.
- Conversion modules
-
The modules in this category perform conversions of data from one
text format to another. We'll take a special look at
Convert::Recode, a popular Perl data conversion
module that can convert between many different kinds of character
sets — for example, between ASCII and EBCDIC mainframe formats.
- Perl XML modules
-
The modules in this category use XML in performing data munging.
We'll focus on the
XML::XMLtoDBMS module, part of the XML-DBMS
middleware project, which is especially effective at transferring
variable data between XML documents and relational databases.
|