[ Team LiB ] Previous Section Next Section

Recipe 14.19 Program: ggh—Grep Netscape Global History

This program divulges the contents of Netscape's history.db file. It can be called with full URLs or with a (single) pattern. If called without arguments, it displays every entry in the history file. The ~/.netscape/history.db file is used unless the -database option is given.

Each output line shows the URL and its access time. The time is converted into localtime representation with -localtime (the default) or gmtime representation with -gmtime—or left in raw form with -epochtime, which is useful for sorting by date.

To specify a pattern to match against, give one single argument without a ://.

To look up one or more URLs, supply them as arguments:

% ggh http://www.perl.com/index.html

To find out a link you don't quite recall, use a regular expression (a single argument without a :// is a pattern):

% ggh perl

To find out everyone you've mailed:

% ggh mailto:

To find out the FAQ sites you've visited, use a snazzy Perl pattern with an embedded /i modifier:

% ggh -regexp '(?i)\bfaq\b'

If you don't want the internal date converted to localtime, use -epoch:

% ggh -epoch http://www.perl.com/perl/

If you prefer gmtime to localtime, use -gmtime:

% ggh -gmtime http://www.perl.com/perl/

To look at the whole file, give no arguments (but perhaps redirect to a pager):

% ggh | less

If you want the output sorted by date, use the -epoch flag:

% ggh -epoch | sort -rn | less

If you want it sorted by date into your local time zone format, use a more sophisticated pipeline:

% ggh -epoch | sort -rn | perl -pe 's/\d+/localtime $&/e' | less

The Netscape release notes claim that they're using NDBM format. This is misleading: they're actually using Berkeley DB format, which is why we require DB_File (not supplied standard with all systems Perl runs on) instead of NDBM_File (which is). The program is shown in Example 14-7.

Example 14-7. ggh
  #!/usr/bin/perl -w
  # ggh -- grovel global history in netscape logs
  $USAGE = << EO_COMPLAINT;
  usage: $0 [-database dbfilename] [-help]
             [-epochtime | -localtime | -gmtime]
             [ [-regexp] pattern] | href ... ]
  EO_COMPLAINT
  
  use Getopt::Long;
  
  ($opt_database, $opt_epochtime, $opt_localtime,
   $opt_gmtime,   $opt_regexp,    $opt_help,
   $pattern,                                  )      = (0) x 7;
  
  usage( ) unless GetOptions qw{ database=s
                                regexp=s
                                epochtime localtime gmtime
                                help
                              };
  
  if ($opt_help) { print $USAGE; exit; }
  
  usage("only one of localtime, gmtime, and epochtime allowed")
      if $opt_localtime + $opt_gmtime + $opt_epochtime > 1;
  
  if ( $opt_regexp ) {
      $pattern = $opt_regexp;
  } elsif (@ARGV && $ARGV[0] !~ m(://)) {
      $pattern = shift;
  }
  
  usage("can't mix URLs and explicit patterns")
      if $pattern && @ARGV;
  
  if ($pattern && !eval { '' =~ /$pattern/; 1 } ) {
      $@ =~ s/ at \w+ line \d+\.//;
      die "$0: bad pattern $@";
  }
  
  require DB_File; DB_File->import( );  # delay loading until runtime
  $| = 1;                              # feed the hungry PAGERs
  
  $dotdir  = $ENV{HOME}    || $ENV{LOGNAME};
  $HISTORY = $opt_database || "$dotdir/.netscape/history.db";
  
  die "no netscape history dbase in $HISTORY: $!" unless -e $HISTORY;
  die "can't dbmopen $HISTORY: $!" unless dbmopen %hist_db, $HISTORY, 0666;
  
  # the next line is a hack because the C programmers who did this
  # didn't understand strlen vs strlen+1.  jwz told me so. :-)
  $add_nulls   = (ord(substr(each %hist_db, -1)) =  = 0);
  
  # XXX: should now do scalar keys to reset but don't 
  #      want cost of full traverse, required on tied hashes.
  #   better to close and reopen?
  
  $nulled_href="";  
  $byte_order  = "V";         # PC people don't grok "N" (network order)
      
  if (@ARGV) {
      foreach $href (@ARGV) {
          $nulled_href = $href . ($add_nulls && "\0");
          unless ($binary_time = $hist_db{$nulled_href}) {
              warn "$0: No history entry for HREF $href\n";
              next;
          }
          $epoch_secs = unpack($byte_order, $binary_time);
          $stardate   = $opt_epochtime ? $epoch_secs
                                       : $opt_gmtime ? gmtime    $epoch_secs
                                                     : localtime $epoch_secs;
          print "$stardate $href\n";
      }
  } else {
      while ( ($href, $binary_time) = each %hist_db ) {
          chop $href if $add_nulls;
          next unless defined $href && defined $binary_time;
          # gnat reports some binary times are missing
          $binary_time = pack($byte_order, 0) unless $binary_time;
          $epoch_secs = unpack($byte_order, $binary_time);
          $stardate   = $opt_epochtime ? $epoch_secs
                                       : $opt_gmtime ? gmtime    $epoch_secs
                                                     : localtime $epoch_secs;
          print "$stardate $href\n" unless $pattern && $href !~ /$pattern/o;
      }
  }
  
  sub usage {
      print STDERR "@_\n" if @_;
      die $USAGE;
  }

14.19.1 See Also

The Introduction to this chapter; Recipe 6.18

    [ Team LiB ] Previous Section Next Section