Book HomeCGI Programming with PerlSearch this book

10.2. DBM Files

DBM files provide many advantages over text files for database purposes, and because Perl provides such a simple, transparent interface to working with DBM files, they are a popular choice for programming tasks that don't require a full RDBMS. DBM files are simply on-disk hash tables. You can quickly look up values by key and efficiently update and delete values in place.

To use a DBM file, you must tie a Perl hash to the file using one of the DBM modules. Example 10-3 shows some code that uses the DB_File module to tie a hash to the file user_email.db.

Example 10-3. email_lookup.cgi

#!/usr/bin/perl -wT

use strict;
use DB_File;
use Fcntl;
use CGI;

my $q        = new CGI;
my $username = $q->param( "user" );
my $dbm_file = "/usr/local/apache/data/user_email.db";
my %dbm_hash;
my $email;

tie %dbm_hash, "DB_File", $dbm_file, O_RDONLY or
    die "Unable to open dbm file $dbm_file: $!";

if ( exists $dbm_hash{$username} ) {
    $email = $q->a( { href => "mailto:$dbm_hash{$username}" },
                    $dbm_hash{$user_name} );
else {
    $email = "Username not found";
}

untie %dbm_hash;

print $q->header( "text/html" ),
      $q->start_html( "Email Lookup Results" ),
      $q->h2( "Email Lookup Results" ),
      $q->hr,
      $q->p( "Here is the email address for the username you requested: " ),
      $q->p( "Username: $username", $q->br,
             "Email: $email" ),
      $q->end_html;

There are many different formats of DBM files, and likewise there are many different DBM modules available. Berkeley DB and GDBM are the most powerful. However, for web development Berkeley DB, and the corresponding DB_File module, is the most popular choice. Unlike GDBM, it provides a simple way for you to lock the database so that concurrent writes do not truncate and corrupt your file.

10.2.1. DB_File

DB_File supports Version 1.xx functionality for Berkeley DB; Berkeley DB Versions 2.xx and 3.xx add numerous enhancements. DB_File is compatible with these later versions, but it supports only the 1.xx API. Perl support for version 2.xx and later is provided by the BerkeleyDB module. DB_File is much simpler and easier to use, however, and continues to be the more popular option. If Berkeley DB is not installed on your system, you can get it from http://www.sleepycat.com/. The DB_File and BerkeleyDB modules are on CPAN. DB_File is also included in the standard Perl distribution (although it is installed only if Berkeley DB is present).

Using DB_File is quite simple, as we saw earlier. You simply need to tie a hash to the DBM file you want to use and then you can treat it like a regular hash. The tie function takes at least two arguments: the hash you want to tie and the name of the DBM module you are using. Typically, you also provide the name of the DBM file you want to use and access flags from Fcntl. You can also specify the file permission for the new file if you are creating a file.

Often, you access hash files on a read/write basis. This complicates the code somewhat because of file locking:

use Fcntl;
use DB_File;

my %hash;
local *DBM;

my $db = tie %hash, "DB_File", $dbm_file, O_CREAT | O_RDWR, 0644 or
    die "Could not tie to $dbm_file: $!";
my $fd = $db->fd;                                            # Get file descriptor
open DBM, "+<&=$fd" or die "Could not dup DBM for lock: $!"; # Get dup filehandle
flock DBM, LOCK_EX;                                          # Lock exclusively
undef $db;                                                   # Avoid untie probs
.
.
# All your code goes here; treat %hash like a normal, basic hash
.
.
untie %hash;        # Clears buffers, then saves, closes, and unlocks file

We use the O_CREAT and O_RDWR flags imported by Fcntl to indicate that we want to open the DBM file for read/write access and create the file if it does not exist. If a new file is created, on Unix systems it is assigned 0644 as its file permissions (although umask may restrict this further). If tie succeeds, we store the resulting DB_File object in $db .

The only reason we need $db is to get the file descriptor of DB_File's underlying DBM file. By using this, we can open a read/write filehandle to this file descriptor. Finally, this gives us a filehandle we can lock with flock. We then undefine $db.

The reason we clear $db is not just to conserve RAM. Typically, when you are done working with a tied hash, you untie it, just as you would close a file, and if you do not explicitly untie it, then Perl automatically does this for you as soon as all references to the DB_File go out of scope. The catch is that untie clears only the variable that it is untying; the DBM file isn't actually written and freed until DB_File's DESTROY method is called -- when all references to the object have gone out of scope. In our code earlier, we have two references to this object: %hash and $db, so in order for the DBM file to be written and saved, both these references need to be cleared.

If this is confusing, then don't worry about the specifics. Just remember that whenever you get a DB_File object (such as $db above) in order to do file locking, undefine it as soon as you have locked the filehandle. Then untie will act like close and always be the command that frees your DBM file.

DB_File provides a very simple, efficient solution when you need to store name-value pairs. Unfortunately, if you need to store more complex data structures, you must still encode and decode them so that they can be stored as scalars. Fortunately, another module addresses this issue.

10.2.2. MLDBM

If you look at the bottom of the Perl manpage, you will see that the three great virtues of a programmer are laziness, impatience, and hubris. MLDBM is all about laziness, but in a virtuous way. With MLDBM, you don't have to worry about encoding and decoding your Perl data in order to fit the confines of your storage medium. You can just save and retrieve it as Perl.

MLDBM turns another DBM like DB_File into a multilevel DBM that is not restricted to simple key-value pairs. It uses a serializer to convert complex Perl structures into a representation that can be stored and deserialized back into Perl again. Thus, you can do things like this:

# File locking omitted for brevity
tie %hash, "MLDBM", $dbm_file, O_CREAT | O_RDWR, 0644;
$hash{mary} = {
    name     => "Mary Smith",
    position => "Vice President",
    phone    => [ "650-555-1234", "800-555-4321" ],
    email    => '[email protected]',
};

Later, you can retrieve this information directly:

my $mary = $hash{mary};
my $position = $mary->{position};

Note that because MLDBM is so transparent it will allow you to ignore the fact that data is stored in name-value pairs:

my $work_phone = $hash{mary}{phone}[1];

However, be careful because this only works when you are reading, not when you are writing. You must still write the data as a key-value pair. This will silently fail:

$hash{mary}{email} = '[email protected]';

You should do this instead:

my $mary = $hash{mary};                      # Get a copy of Mary's record
$mary{email} = '[email protected]';     # Modify the copy
$hash{mary} = $mary;                         # Write the copy to the hash

MLDBM keeps track of blessed objects, so it works exceptionally well for storing objects in Perl:

use Employee;

my $mary = new Employee( "Mary Smith" );
$mary->position( "Vice President" );
$mary->phone( "650-555-1234", "800-555-4321" );
$mary->email( '[email protected]' );
$hash{mary} = $mary;

and for retrieving them:

use Employee;

my $mary = $hash{mary};
print $mary->email;

When retrieving objects, be sure you use the corresponding module (in this case, a fictional module called Employee) before you try to access the data.

MLDBM does have limitations. It cannot store and retrieve filehandles or code references (at least not across multiple CGI requests).

When you use MLDBM, you must tell it which DBM module to use as well as which module to use for serializing and deserializing the data. The options include Storable, Data::Dumper, and FreezeThaw. Storable is the fastest, but Data::Dumper is included with Perl.

When you use MLDBM with DB_File, you can lock the underlying DBM file just like you would with DB_File:

use Fcntl;
use MLDBM qw( DB_File Storable );

my %hash;
local *DBM;

my $db = tie %hash, "MLDBM", $dbm_file, O_CREAT | O_RDWR, 0644 or
    die "Could not tie to $dbm_file: $!";
my $fd = $db->fd;                                            # Get file descriptor
open DBM, "+<&=$fd" or die "Could not dup DBM for lock: $!"; # Get dup filehandle
flock DBM, LOCK_EX;                                          # Lock exclusively
undef $db;                                                   # Avoid untie probs
.
.
# All your code goes here; treat %hash like a normal, complex hash
.
.
untie %hash;        # Clears buffers then saves, closes, and unlocks file


Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.