Team LiB   Previous Section   Next Section

Appendix D. The Essential Guide to Perl Data Munging

Oracle DBAs spend a great deal of time handling data that for one reason or another needs to be cleaned, transformed, and/or formatted. They need to fill Oracle data warehouses with customer data from multiple sources, import data into Oracle databases from non-Oracle data streams, and convert and format source material of all kinds. Whether it's an XML stream from a web page, a SQL*Loader feed from a telecom switch, or a snapshot transfer from another database, DBAs must ensure that these data transfers are clean, accurate, and timely. Unfortunately, the raw data they're given to work with is often dirty, inaccurate, behind schedule, and unfit for SQL*Loader. This is a job for Perl and its wonderful world of data munging!

Data munging, the process of transforming data as it is transferred from one place to another, is a topic that is increasingly important for Oracle DBAs to understand. It is also an operation that Perl is particularly good at. Perl DBI's innate ability to deal with multiple database types simultaneously also makes the transfer of data from one database to another as simple as lining up dominoes!

This appendix presents the basics of data munging and illustrates a typical data-munging operation — importing a MySQL data stream into an Oracle database, transforming it as necessary. We'll also describe the many Perl data-munging modules that you can download from CPAN and use in conjunction with Oracle databases. We'll examine these modules in several major categories:

Numeric modules

The modules in this category deal with numeric data and handle mathematical operations used in data munging. We'll pay special attention to the very useful Number::Format module.

Date modules

The modules in this category deal with the special requirements of dates and their formatting and conversion. Because speed is often important in data munging, we'll focus on the very efficient C-based Date::Calc module.

Conversion modules

The modules in this category perform conversions of data from one text format to another. We'll take a special look at Convert::Recode, a popular Perl data conversion module that can convert between many different kinds of character sets — for example, between ASCII and EBCDIC mainframe formats.

Perl XML modules

The modules in this category use XML in performing data munging. We'll focus on the XML::XMLtoDBMS module, part of the XML-DBMS middleware project, which is especially effective at transferring variable data between XML documents and relational databases.

    Team LiB   Previous Section   Next Section