13.2 Modifying a Module in the Toolkit

13.2 Modifying a Module in the Toolkit

In Chapter 9 we introduced the supporting modules included in the Perl DBA Toolkit and described briefly what they do. Just as you might wish to modify or extend the toolkit's scripts, as we described earlier in this chapter, you might also find a good reason to modify the modules. In an effort to anticipate the kinds of changes you might want to make to these modules, in the following sections we'll provide a quick guide on how to modify the modules in the toolkit. We'll show two examples here and hope you can extrapolate to many more:

Adding a method: We'll add a usage method to the PDBA module. This will allow us to define a scalar variable containing help screen information, which can then be passed into the usage method. This will save us from having to code individual usage subroutines in each separate script.
Dealing with NULL columns returned by Oracle: We'll deal with NULLs returned by Oracle when printing output. This is a useful thing to do because NULL values raise undefined value errors when included in certain Perl statements.

13.2.1 Modifying the PDBA Module to Add a Method

In this section we'll essentially modify the PDBA module in order to add a method that will help us in doing our work. In reality, though, rather than modifying the existing code, we're going to create our own parallel, modified module. We'll explain why we've taken this approach as we work through the example.

We've created a separate downloadable module, called PDBAx, for "PDBA Extensions," that contains the code we describe in this section. If you want to do so, you can download and install this code in the same manner as you would the ordinary PDBA module. There's no absolute need to download and install PDBAx, but you may wish to do so to help follow the rest of this chapter or simply for your own experimentation. You can download PDBAx-1.00.tar.gz, or its latest derivative, from our book's page on the O'Reilly site:

http://www.oreilly.com/catalog/oracleperl/pdbatoolkit

Installation is straightforward, as we describe in the following sections.

13.2.1.1 Installing PDBAx on Unix

Run the following to install PDBAx on Unix systems:

$ gunzip -c PDBAx-1.00.tar.gz | tar xvf -
$ cd PDBAx-1.00
$ perl Makefile.PL
$ make install

There are no tests to run for this module.

13.2.1.2 Installing PDBAx on Win32

Download the PDBAx.ppd PPM file from the web site and save it in a location such as C:\TEMP. You probably know the rest of the drill:

DOS> ppm
PPM> install --location c:\temp PDBAx

13.2.2 Adding a Usage Method

Most of the scripts included in our toolkit employ a usage subroutine that is called for various reasons. Perhaps the -help option was included on the command line, or perhaps required options were missing. The usage routine generally looks something like this:

sub usage {
   my $exitVal = shift;
   use File::Basename;
   my $basename = basename($0);
   print qq/
$basename
  
usage: $basename 
  -machine  database_server 
  -database database_instance 
  -username account 
  
/;
   exit $exitVal;
};

A common PDBA.pm method would eliminate the need for this subroutine in other scripts. Writing such a method is one approach to solving your problem. Some suggested code for such a method is shown in Example 13-4.

Example 13-4. The PDBA usage method

sub usage {
   my ($exitVal,$helpStrRef) = @_;
   use File::Basename;
   my $basename = basename($0);
   print qq{
usage: $basename
  
${$helpStrRef}
  
};
   exit $exitVal;
};

Here's how you might use it in a script:

use PDBA;
  
my $help = q{ 
  -database database to connect to
  -username database account
  -password password for the account
};
...
if ( $optctl{help} ) { PDBA::usage(1,\$help) }
...

So now you place your new usage method in the PDBA.pm file and try it out in a few scripts. And it works great. However, there's one small problem. What happens if you install a newer version of the PDBA Toolkit module library? That's right — your carefully crafted usage method will no longer be in the PDBA module, and all of your scripts calling PDBA::usage will break. Ouch!

Rather than modifying the PDBA module, why not create your own subclassed module? Doing so will allow you to extend the PDBA module without fear of breaking scripts that use it in its current incarnation. Creating your own module also eliminates the problems that would occur if you download a new version of the PDBA module and it overwrites your carefully crafted extensions.

Perl lets you do this with relative ease, and we'll show you how.^[3] Let's call our new subclassed module PDBAx. This module will take the place of PDBA in your scripts. The full code for PDBAx appears in Example 13-5.

^[3] We're using the simplest features of Perl's object orientation, as described in Appendix A. For a more definitive description, refer to Object Oriented Perl, by Damian Conway (Manning 2000).

Example 13-5. The entire PDBAx module

package PDBAx;
  
our $VERSION=1.00;
  
use PDBA;
our @ISA = qw{PDBA};
sub usage {
   my ($exitVal,$helpStrRef) = @_;
   use File::Basename;
   my $basename = basename($0);
   print qq{
usage: $basename
  
${$helpStrRef}
  
};
   exit $exitVal;
};
1;

You may be surprised at how little code there is in Example 13-5. Yet all the features of the PDBA module are available through PDBAx. That's because of the magic of the @ISA array. The methods and attributes of modules placed in @ISA are inherited by calling modules — in this case, PDBAx. Try running the code shown in Example 13-6. The PDBAx osname method is inherited directly from the PDBA module.

Example 13-6. Testing PDBAx

#!/usr/bin/perl -w
  
use warnings;
use strict;
  
use PDBAx;
print "$PDBA::VERSION\n";
  
print PDBAx->osname, "\n";
  
my $help = q{
  -database   database to connect to
  -username   user to connect as
  -password   password for user
};
PDBAx::usage(1,\$help);

The benefit of extending PDBA in this way is that when your intrepid authors release the latest version of the PDBA Toolkit, your usage method will be safely encapsulated within its own PDBAx module. Even if we come up with similarly named methods, yours will override them. It's a kind of magic.

13.2.3 Modifying the PDBA::GQ Module to Deal with NULL Columns

In the following sections we'll describe how you can modify the PDBA::GQ module to deal with Oracle NULL values. First, though, let's take a look at the problems involved in using NULLs.

13.2.3.1 Oracle and NULL values

When you first start using the Oracle database, NULL values may take a little getting used to. A NULL is never equivalent to any other value, including another NULL. The truth table in Table 13-1 sums up the results of comparing NULL to NULL, with various SQL operators. Note that only one True is returned with the special IS NULL comparison.

Table 13-1. Null truth table

Option

Description

NULL = NULL

False

NULL <> NULL

False

NULL < NULL

False

NULL > NULL

False

NULL IS NOT NULL

False

NULL BETWEEN NULL AND NULL

False

NULL = ''

False

NULL IS NULL

True

One problem you discover when dealing with NULLs in an Oracle database is that Oracle treats empty strings and NULL values the same way. This is different from other databases, and is readily apparent when you use NULLs with Perl. We'll see this in the next section.

13.2.3.2 Testing the use of NULLs

The null_test.pl script in Example 13-7 builds a table, NULL_TEST, and populates it with two rows of data.

Example 13-7. null_test.pl

01: #!/usr/bin/perl
02: 
03: use warnings;
04: use strict;
05: use PDBA::CM;
06: use PDBA::GQ;
07: use PDBA::DBA;
08: 
09: my ( $database, $username, $password ) = qw{ts01 scott tiger};
10: 
11: my $dbh = new PDBA::CM(
12:   DATABASE => $database,
13:   USERNAME => $username,
14:   PASSWORD => $password
15: );
16: 
17: eval {
18:   local $dbh->{PrintError} = 0;
19:   $dbh->do(q{drop table null_test});
20: };
21: 
22: $dbh->do(q{create table null_test 
23:   ( 
24:     first_name varchar2(20) not null,
25:     middle_initial varchar2(1) null,
26:     last_name varchar2(20) not null
27:   )
28: });
29: 
30: my $insHandle = $dbh->prepare(q{insert into null_test values(?,?,?)});
31: 
32: $insHandle->execute('Alfred','E','Neuman');
33: $insHandle->execute('Peter',undef,'Parker');
34: $insHandle->execute('Clark','','Kent');
35: $dbh->commit;
36: 
37: my $gq = new PDBA::GQ ( $dbh, 'null_test');
38: 
39: while ( my $row = $gq->next ) {
40:   printf("Last: %-20s First: %-20s MI: %1s\n", 
41:     $row->{LAST_NAME},
42:     $row->{FIRST_NAME},
43:     $row->{MIDDLE_INITIAL}
44:   );
45: }
46: $dbh->disconnect;

Note the following about this example:

Lines 22 to 28: The CREATE TABLE statement defines two columns, FIRST_NAME and LAST_NAME, which are both required, while the MIDDLE_INITIAL column is nullable.
Line 32: The first row insert places values in all three columns.
Line 33: Only the FIRST_NAME and LAST_NAME columns are populated. The value for the MIDDLE_INITIAL column is defined as undef. This causes the MIDDLE_INITIAL value to be NULL.
Line 35: Inserts another row, but this time the MIDDLE_INITIAL column is populated with an empty string.

Using the following SQL, we can now prove that Oracle treats both the empty string and undef as NULL values:

SQL> select * from null_test where middle_initial is null;
  
FIRST_NAME           M LAST_NAME
-------------------- - --------------------
Peter                  Parker
Clark                  Kent
  
2 rows selected.

This problem becomes apparent in Perl when a NULL column is retrieved from a database and an attempt is made to reference the value in a statement. You can see this in lines 40 to 44 of Example 13-7, which we reproduce here:

40:   printf("Last: %-20s First: %-20s MI: %1s\n", 
41:     $row->{LAST_NAME},
42:     $row->{FIRST_NAME},
43:     $row->{MIDDLE_INITIAL}
44:   );

Line 40 to 44

Line 40 is the first line of a multi-line printf statement. Line 41 references the LAST_NAME, line 42 the FIRST_NAME, and line 43 the MIDDLE_INITIAL. Printing the first row presents no problem, because all three columns are populated. That changes with the second row when the script is executed, as seen here:

Last: Neuman               First: Alfred               MI: E
Use of uninitialized value in printf at ./null_test.pl line 40 (#1)
 (W uninitialized) An undefined value was used as if it were already
 defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
 To suppress this warning assign a defined value to your variables.
...    
Last: Parker               First: Peter                MI:

Whoa! The second row returned a NULL for MIDDLE_INITIAL, which is treated by Perl as an undef. This threw out a warning because of the use warnings pragma at the top of the script.

13.2.3.3 Considering changes to the script

The warning output we saw in the previous section could be eliminated by turning off the warning mechanism temporarily as follows:

38: no warnings;
39: while ( my $row = $gq->next ) {
40:   printf("Last: %-20s First: %-20s MI: %1s\n", 
41:     $row->{LAST_NAME},
42:     $row->{FIRST_NAME},
43:     $row->{MIDDLE_INITIAL}
44:   );
45: }
46: use warnings

However, this sweeps potentially difficult problems under the rug, which will almost always re-surface to bite us later. A better solution would ensure that all of the row elements get a guaranteed value before they get pumped into printf:

39: while ( my $row = $gq->next ) {
  
     foreach my $key ( keys %$row )
        { $row->{$key} = '' unless defined $row->{$key} }
40:   printf("Last: %-20s First: %-20s MI: %1s\n", 
41:     $row->{LAST_NAME},
42:     $row->{FIRST_NAME},
43:     $row->{MIDDLE_INITIAL}
44:   );
45: }

The foreach loop added between lines 39 and 40 assigns empty strings to any undefined values in the $row hash reference, thereby preventing warnings.

13.2.3.4 Modifying the PDBA::GQ module

An even better solution would extend the PDBA::GQ (Generic Query) module so your scripts would automatically deal with NULL column data. However, as we learned earlier, modifying a module presents its own problems. What we need is a modified version of the PDBA::GQ module. More specifically, we should modify the PDBA::GQ->next method. Example 13-8 shows one way to do this.

Example 13-8. PDBAx::GQ

01: package PDBAx::GQ;
02: 
03: our $VERSION=1.0;
04: 
05: use Carp;
06: use warnings;
07: use strict;
08: 
09: use PDBA::GQ;
10: our @ISA = qw{PDBA::GQ};
11: 
12: sub next {
13:    my $self = shift;
14:    my ( $ref ) = @_;
15:    $ref ||= [];
16: 
17:    my $refType = ref $ref;
18: 
19:    my $data;
20: 
21:    $data = $self->SUPER::next($ref);
22:    return unless $data;
23: 
24:    # transform NULL columns to a defined value
25:    # to avoid problems with undefined values
26:    if ( 'ARRAY' eq $refType ) {
27:       foreach(@$data){ $_ = '' unless defined }
28:    } elsif ( 'HASH' eq $refType ) {
29:       foreach my $key ( keys %$data ) 
30:       { $data->{$key} = '' unless defined $data->{$key} }
31:    } else { 
32:       croak "invalid ref type of $refType " .
33:             "used to call PDBAx::GQ->next\n" 
34:    }
35:    return $data;
36: }
37: 1;

Lines 1 to 10: Set up the new module as a subclass of the original PDBA::GQ module.
Line 13: Picks up the object's reference in the $self variable.
Lines 14 to 17: Here we set up our subclassed next method to use an array reference as its default datatype. Line 14 assigns the next argument (if it exists) to the $ref variable. Line 15 assigns an empty array to $ref if the assignment failed in line 14. (We've also changed this default to an ordinary array reference in this extension, rather than a hash array reference, as with the older PDBA::GQ.) Line 17 sets the $refType scalar to the type of reference in use. This will be used later when assigning a value to NULL columns. (See Appendix A for more information about references and the related ref operator.)
Line 21: Calls the next method in the parent class via the SUPER pseudoclass. This accesses methods in parent classes and lets you modify the behavior of the base class without rewriting all of its code.
Line 22: Returns if no data was found.
Lines 26 to 31: Here the contents of the returned data are checked for NULL values. We first need to determine if the data is in a hash reference or an array reference. This is done via the $refType variable created earlier. If the data is in an array reference, each element is checked to see if it is defined. If it is undefined, an empty string is assigned. This is done in line 27. The same is done for data returned as a hash reference. Lines 29 and 30 assign empty strings to keys with undefined values. Because an empty string is a valid defined value in Perl, we have eliminated our warnings.

We need to change just two of the lines in the null_test.pl script we first encountered in Example 13-8:

...
06: use PDBAx::GQ;
...
37: my $gq = new PDBAx::GQ ( $dbh, 'null_test');
...

All of the functionality of the PDBA::GQ module is still available, but your modifications allow you to stop thinking about referencing NULL values.

13.2.3.5 Taking one more step

The PDBAx::GQ extension still has a shortcoming. It assigns an empty string to numeric columns that are NULL, and this may be unsuitable for some purposes. Financial reporting may require that these NULL columns be assigned a numeric zero. While this can be done, the complexity of the code required to do it increases significantly, as we'll see in the code supplied via the downloaded PDBAx::GQ module in Example 13-9.

Example 13-9. Assigning zero to NULL numeric columns in PDBAx::GQ

01: package PDBAx::GQ;
02: 
03: our $VERSION=1.00;
04: 
05: use Carp;
06: use warnings;
07: use strict;
08: 
09: use PDBA::GQ;
10: our @ISA = qw{PDBA::GQ};
11: 
12: my @columnTypes;
13: my %columnTypes;
14: 
15: sub new {
16:   my $self = shift;
17:   my ( $dbh ) = $_[0];
18:   my $qobj = $self->SUPER::new(@_);
19: 
20:   # get column types for array refs
21:   @columnTypes = 
         map {scalar $dbh->type_info($_)->{TYPE_NAME}} @{$qobj->{TYPE}};
22: 
23:   # get column types for hash refs
24: 
25:   # get an array of data type numbers
26:   my @types = @{$qobj->{TYPE}};
27: 
28:   # get a hash ref of column names and position
29:   my $nameHash = $qobj->{NAME_uc_hash};
30: 
31:   # create a reverse hash with the column number as the key and
32:   # the column name as the value
33:   my %colnumHash = map { $nameHash->{$_} => $_  } keys %$nameHash;
34: 
35:   # create an array of the type names ( VARCHAR2, DATE, etc ) from the
36:   # type info method
37:   my @columnTypeNames = 
         map { scalar $dbh->type_info($_)->{TYPE_NAME} } @types;
38: 
39:   # create a hash with column name as the key and 
        data type as the value
40:   %columnTypes = 
         map { $colnumHash{$_} => 
               $columnTypeNames[$_] } 0..$#columnTypeNames;
41: 
42:   $qobj->{private_PDBA_DATA_TYPES_ARRAY} = \@columnTypes;
43:   $qobj->{private_PDBA_DATA_TYPES_HASH} = \%columnTypes;
44:   return $qobj;
45: }
46: 
47: sub next {
48:   my $self = shift;
49:   my ( $ref ) = @_;
50:   $ref ||= [];
51: 
52:   my $refType = ref $ref;
53: 
54:   my $data;
55: 
56:   $data = $self->SUPER::next($ref);
57:   return unless $data;
58: 
59:   # transform NULL columns to a defined value
60:   # to avoid problems with undefined values
61:   if ( 'ARRAY' eq $refType ) {
62:     foreach my $el ( 0..$#{$data} ) {
63:       unless ( defined $data->[$el] ) {
64:         if ( $self->{private_PDBA_DATA_TYPES_ARRAY}[$el] =~ /CHAR/ ) {
65:           $data->[$el] = '';
66:         } elsif ( $self->{private_PDBA_DATA_TYPES_ARRAY}[$el] =~ 
                                 /DOUBLE|NUMBER/ ) {
67:           $data->[$el] = 0;
68:         } else { $data->[$el] = '' }
69:       }
70:     }
71:   } elsif ( 'HASH' eq $refType ) {
72:     foreach my $key ( keys %$data ) {
73:       unless (defined $data->{$key} ) {
74:         if ( $self->{private_PDBA_DATA_TYPES_HASH}{$key} =~ /CHAR/ ) {
75:           $data->{$key} = '';
76:         } elsif ( $self->{private_PDBA_DATA_TYPES_HASH}{$key} =~ 
                                 /DOUBLE|NUMBER/ ) {
77:           $data->{$key} = 0;
78:         } else { $data->{$key} = '' }
79:       }
80:     }
81:   } else { 
         croak 
         "invalid ref type of $refType used to call PDBAx::GQ->next\n"}
82: 
83:   return $data;
84: }
85: 
86: 1;

In this example we subclass the new method of PDBA::GQ. We do this so we can determine the datatypes for each column selected in a query.

Line 21: Uses DBI's type_info method to retrieve query column datatypes.
Lines 26 to 40: Store the datatypes for each column in both an array and a hash, so we're prepared for whatever the next method throws at us.
Lines 42 to 43: Take advantage of a seldom-used Perl DBI feature, the private_ attributes that may be assigned to a database handle. The DBI documentation states that we're allowed to assign new attributes to a statement handle as long as they begin with the private_ prefix. These private attributes are used in the PDBAx::GQ->next method to determine if the value returned for a column is undefined. If so, it determines the datatype of each of those columns. Once known, a zero is assigned to returned columns with a numeric type, and an empty string to all other undefined columns. This is admittedly more complex than the situation we had before, because of our new requirement to assign zeroes to unassigned numeric columns. However, the added effort is worth it for the convenience of remaining unconcerned about the side effects of NULL columns.

The script in Example 13-10 uses the all-new PDBAx::GQ module. Both numeric and character columns are inserted into a test table with NULL values, and then later printed out without any need to check to see if they're undefined. The script is stored in the PDBAx distribution as pdba_ext2.pl.

Example 13-10. Using the PDBAx:GQ module with numeric and character values

#!/usr/bin/perl
  
use warnings;
use strict;
use PDBA::CM;
use PDBAx::GQ;
use PDBA::OPT;
use Getopt::Long;
use PDBAx;
  
my %optctl=(  );
  
my $help=q{
  -machine   database server
  -database  database SID
  -username  account name
  -password  password for account
};
  
# passthrough allows additional command line options
# to be passed to PDBA::OPT if needed
Getopt::Long::Configure(qw{pass_through});
  
GetOptions( \%optctl, "help!",      "machine=s",
                      "database=s", "username=s",
                      "password=s", );
  
if ( $optctl{help} ) { PDBAx::usage(1,\$help) }
  
# lookup the password if not on the command line
my $password = '';
if ( defined( $optctl{password} ) ) {
   $password = $optctl{password};
} else {
  
   if (
      ! defined($optctl{machine})
      || ! defined($optctl{database})
      || ! defined($optctl{username})
   ) { PDBAx::usage(1,\$help) }
  
   $password = PDBA::OPT->pwcOptions (
      INSTANCE => $optctl{database},
      MACHINE => $optctl{machine},
      USERNAME => $optctl{username}
   );
}
my $dbh = new PDBA::CM( DATABASE => $optctl{database},
                        USERNAME => $optctl{username},
                        PASSWORD => $password, );
# drop test table
eval { $dbh->do(q{drop table star_trek}); };
  
$dbh->do(q{create table star_trek( title varchar2(50)
                                   , year_released varchar2(4)
                                   , viewings number(4) )} );
  
my $insHandle = $dbh->prepare(q{ insert into star_trek values(?,?,?) });
  
$insHandle->execute('Star Trek - The Motion Picture','1979',1);
$insHandle->execute('Star Trek II - The Wrath of Khan','1982',4);
$insHandle->execute('Star Trek III - The Search for Spock','1984',undef);
$insHandle->execute('Star Trek IV - The Voyage Home','1986',8);
$insHandle->execute('Star Trek V - The Final Frontier','1989',1);
$insHandle->execute('Star Trek VI - The Undiscovered Country','1991',3);
$insHandle->execute('Star Trek Generations','1994',1);
$insHandle->execute('Star Trek - First Contact','1996',4);
$insHandle->execute('Star Trek - Insurrection','1998',2);
$insHandle->execute('Star Trek: Nemesis',undef,undef);
  
$dbh->commit;
  
my $gq = new PDBAx::GQ($dbh, 'star_trek', {ORDER_BY=>'year_released'});
  
my $colHash = $gq->getColumns;
  
while ( my $row = $gq->next ) {
   print "TITLE: $row->[$colHash->{TITLE}]\n";
   print "\tYEAR: $row->[$colHash->{YEAR_RELEASED}]\n";
   print "\tVIEWINGS: $row->[$colHash->{VIEWINGS}]\n";
}
$dbh->disconnect;

Previously, printing values returned from NULL columns would have required checking the return values within each script; now, we can safely ignore them:

$ pdba_ext2.pl -machine sherlock -database ts01 -username jkstill
  
TITLE: Star Trek - The Motion Picture
YEAR: 1979
VIEWINGS: 1
TITLE: Star Trek II - The Wrath of Khan
YEAR: 1982
VIEWINGS: 4
TITLE: Star Trek III - The Search for Spock
YEAR: 1984
VIEWINGS: 0
TITLE: Star Trek IV - The Voyage Home
YEAR: 1986
VIEWINGS: 8
TITLE: Star Trek V - The Final Frontier
YEAR: 1989
VIEWINGS: 1
TITLE: Star Trek VI - The Undiscovered Country
YEAR: 1991
VIEWINGS: 3
TITLE: Star Trek Generations
YEAR: 1994
VIEWINGS: 1
TITLE: Star Trek - First Contact
YEAR: 1996
VIEWINGS: 4
TITLE: Star Trek - Insurrection
YEAR: 1998
VIEWINGS: 2
TITLE: Star Trek: Nemesis
YEAR: 
VIEWINGS: 0

Notice that one of the entries has a blank year of release, and two of them have zero viewings. These values are actually stored as NULL in the database, but now, because of our implementation of PDBAx::GQ, they may be referenced with impunity — without invoking the wrath of Perl and use warnings. Revenge is a dish best eaten without undefined values!