Data-Munging Example: An Inter-Database TransferPeople often think of SQL*Loader as being the answer to all their data loading needs. But the reality is that running SQL*Loader might be the last step in a data load, not the only step. You might need to perform a number of additional steps to get data in a state fit for use by SQL*Loader. And sometimes you might not need SQL*Loader at all; often, Perl DBI works equally well as the last process stage for finally inserting data into a database. This is particularly true if you've used Perl exclusively to get to that last stage. Why add another process to manage, even one as good as SQL*Loader? Let's keep that data load as simple as possible. We'll discuss the respective roles of Perl DBI and SQL*Loader later on in more detail. For now though, we'll introduce data munging conceptually by providing a very basic source-to-sink example in a single munge operation. This example emphasizes Perl DBI's ability to munge data across from one database type to another within a single Perl script. In this example, MySQL is the source and Oracle is the sink. We're going to munge the data from one datatype (MySQL) into another (Oracle), plus do a little date format munging on the side. The MySQL SourceWe'll assume in this example that the data you are loading into an Oracle database comes from a MySQL database. You can find out more about MySQL at the following sites: You might also like to check out Jochen Wiedmann's DBD::mysql driver; this driver is the interface that allows Perl programs to connect to MySQL databases via Perl DBI: Assuming that a MySQL test database has already been created, let's go ahead and create the source data and prepare to transfer it to our Oracle database, orcl. These are the steps we followed:
The Oracle SinkWe'd like to transfer these three rows across to the EMP table under Oracle's orcl database. We'll do this via the munge script in Example D-1. Inter-database transfers into Oracle — mySQLtoOracle.pl#!perl -w use strict; use DBI; # Step 1: Establish a MySQL source database handle, and # an Oracle sink database handle. Notice we can connect to two # different databases, and database types, at the same time, # in one Perl script. Code Simplicities 'R' Us! :-) my $mysql_dbh = DBI->connect('DBI:mysql:database=test;host=localhost', 'irish', 'lion') or die "Couldn't connect to MySQL database: " . DBI->errstr; my $oracle_dbh = DBI->connect('DBI:Oracle:orcl', 'scott', 'tiger', { RaiseError=>1, AutoCommit=>0 } ); # Step 2: Prepare and execute the selection statement taking # data from our MySQL source. Bind the columns, for efficiency. my $select_sql = qq { SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno FROM emp_store }; my $mysql_sth = $mysql_dbh->prepare( $select_sql ) or die "Couldn't prepare selection statement: " . $mysql_dbh->errstr; $mysql_sth->execute; # Create the munge bind variables my ($empno, $ename, $job, $mgr, $hiredate, $sal, $comm, $deptno ); $mysql_sth->bind_columns( \$empno, \$ename, \$job, \$mgr, \$hiredate, \$sal, \$comm, \$deptno ); # Step 3: Prepare our Oracle insert statement. my $insert_sql = qq{ INSERT INTO emp ( empno, ename, job, mgr, hiredate, sal, comm, deptno ) VALUES ( ?, ?, ?, ?, to_date( ? , 'YYYY-MM-DD'), ?, ?, ?) }; my $oracle_sth = $oracle_dbh->prepare( $insert_sql ); # Step 4: Select from MySQL and fill bound array, before populating # Oracle EMP table. while ($mysql_sth->fetch) { $oracle_sth->bind_param(1, $empno); $oracle_sth->bind_param(2, $ename); $oracle_sth->bind_param(3, $job); $oracle_sth->bind_param(4, $mgr); $oracle_sth->bind_param(5, $hiredate); $oracle_sth->bind_param(6, $sal); $oracle_sth->bind_param(7, $comm); $oracle_sth->bind_param(8, $deptno); # Insert! $oracle_sth->execute; } # Step 5: Clean up, commit the transaction, and finish. $oracle_dbh->commit( ); $mysql_dbh->disconnect( ); $oracle_dbh->disconnect( ); Let's see what's going on in this code:
Running the script itself is straightforward: $ perl mySQLtoOracle.pl We can then check the orcl database. Notice that our earlier to_date( ) operation has given us the dates in the more usual Oracle-style DD-MON-YY format: $ sqlplus scott/tiger@orcl ORCL> select * from emp where empno < 2000; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO --------- ---------- --------- --- --------- ------ ------ --------- 1001 Groucho Professor 1 01-JAN-01 100 10 10 1002 Chico Minister 2 02-JAN-01 200 20 20 1003 Harpo Stowaway 3 03-JAN-01 300 30 30 3 rows selected. ORCL> That concludes our simple example -- but wait a minute! There seems to be precious little in the way of actual data transformation going on except for the date munge. The main transformation here was from MySQL data to Oracle data, and the fact is that this transformation is extraordinarily simple to do in Perl. Nevertheless, the reality is that very few other languages could have managed this transformation so trivially, in so few lines of code. We can add onto this simple example by layering on additional data-munging operations, depending on specific processing requirements. For instance, we could pull information from other databases to get hold of department descriptions, drag in other personnel information from remote HR databases, aggregate the salaries, substitute some of the data to match agreed-upon business rules, and so on. And all of this is easily done in Perl. For many more data-munging examples, refer to the more detailed sources mentioned at the beginning of this chapter. The use of Perl for data munging gives us something else in addition to the excellent resources of Perl DBI. We also get the ability to use the 200-plus built-in operators, such as split, join, and reverse, binary-capable functions such as read, regular expressions (covered in Appendix C), and the 2000-plus object-oriented Perl modules available from www.cpan.org (or www.activestate.com). We can make use of all of these resources, in conjunction with Perl DBI, to carry out a wide range of the most difficult data-munging operations. In the rest of this appendix, we'll summarize what we consider to be the best of these 2000-plus data-munging modules. Whether you're regularly filling data warehouses with difficult-to-extrapolate aggregated data, managing the ever-increasing complexity of XML information transfer, or just moving small pieces of fiddly DBA data from one place to another, Perl is a comprehensive one-stop shop full of data-munging functionality. |