Book HomeEssential SNMPSearch this book

12.3. Throw Core

Programs frequently leave core dumps behind. A core file contains all the process information pertinent to debugging. It usually gets written when a program dies abnormally. While there are ways to limit the size of a dump or prevent core dumps entirely, there are still times when they're needed temporarily. Therefore, most Unix systems have some sort of cron script that automatically searches for core files and deletes them. Let's add some intelligence to these scripts to let us track what files are found, their sizes, and the names of the processes that created them.

The following Perl program is divided into four parts: it searches for a file with a given name (defaults to the name core), gets the file's statistics, deletes the file,
[66] and then sends a trap. Most of the processing is performed natively by Perl, but we use the command ls -l $FILENAME to include the pertinent core file information within the SNMP trap. This command allows our operators to see information about the file in a format that's easy to recognize. We also use the file command, which determines a file's type and its creator. Unless you know who created the file, you won't have the chance to fix the real problem.

[66]Before you start deleting core files, you should figure out who or what is dropping them and see if the owner wants these files. In some cases this core file may be their only means of debugging.

#!/usr/local/bin/perl

# Finds and deletes core files. It sends traps upon completion and 
# errors. Arguments are:
# -path directory   : search directory (and subdirectories); default /
# -lookfor filename : filename to search for; default core
# -debug value      : debug level

while ($ARGV[0] =~ /^-/)
{
    if    ($ARGV[0] eq "-path")    { shift; $PATH    = $ARGV[0]; }
    elsif ($ARGV[0] eq "-lookfor") { shift; $LOOKFOR = $ARGV[0]; }
    elsif ($ARGV[0] eq "-debug")   { shift; $DEBUG   = $ARGV[0]; }
    shift;
}


#################################################################
##########################  Begin Main  #########################
#################################################################

require "find.pl";     # This gives us the find function.

$LOOKFOR = "core" unless ($LOOKFOR); # If we don't have something 
                                     # in $LOOKFOR, default to core

$PATH    = "/"    unless ($PATH);    # Let's use / if we don't get 
                                     # one on the command line

(-d $PATH) || die "$PATH is NOT a valid dir!";    # We can search
                                                  # only valid 
                                                  # directories

&find("$PATH");

#################################################################
######################  Begin SubRoutines  ######################
#################################################################

sub wanted
{
    if (/^$LOOKFOR$/)
        {
            if (!(-d $name)) # Skip the directories named core
            {
               &get_stats;
               &can_file;
               &send_trap;
            }
        }
}

sub can_file
{
    print "Deleting :$_: :$name:\n" unless (!($DEBUG));
    $RES = unlink "$name";
    if ($RES != 1) { $ERROR = 1; }
}

sub get_stats
{
    chop ($STATS = `ls -l $name`);
    chop ($FILE_STATS = `/bin/file $name`);

    $STATS =~ s/\s+/ /g;
    $FILE_STATS =~ s/\s+/ /g;
}

sub send_trap
{
    if ($ERROR == 0) { $SPEC = 1535; }
    else             { $SPEC = 1536; }
    print "STATS: $STATS\n" unless (!($DEBUG));
    print "FILE_STATS: $FILE_STATS\n" unless (!($DEBUG));

# Sending a trap using Net-SNMP
#
#system "/usr/local/bin/snmptrap nms public .1.3.6.1.4.1.2789.2500 '' 6 $SPEC '' 
#.1.3.6.1.4.1.2789.2500.1535.1 s \"$name\" 
#.1.3.6.1.4.1.2789.2500.1535.2 s \"$STATS\" 
#.1.3.6.1.4.1.2789.2500.1535.3 s \"$FILE_STATS\"";

# Sending a trap using Perl
#
use SNMP_util "0.54";  # This will load the BER and SNMP_Session for us
snmptrap("public\@nms:162", ".1.3.6.1.4.1.2789.2500", mylocalhostname, 6, $SPEC, 
".1.3.6.1.4.1.2789.2500.1535.1", "string", "$name", 
".1.3.6.1.4.1.2789.2500.1535.2", "string", "$STATS", 
".1.3.6.1.4.1.2789.2500.1535.3", "string", "$FILE_STATS");

# Sending a trap using OpenView's snmptrap
#
#system "/opt/OV/bin/snmptrap -c public nms 
#.1.3.6.1.4.1.2789.2500 \"\" 6 $SPEC \"\" 
#.1.3.6.1.4.1.2789.2500.1535.1 octetstringascii \"$name\" 
#.1.3.6.1.4.1.2789.2500.1535.2 octetstringascii \"$STATS\" 
#.1.3.6.1.4.1.2789.2500.1535.3 octetstringascii \"$FILE_STATS\"";
}
The logic is simple, though it's somewhat hard to see since most of it happens implicitly. The key is the call to find( ), which sets up lots of things. It descends into every directory underneath the directory specified by $PATH and automatically sets $_ (so the if statement at the beginning of the wanted() subroutine works). Furthermore, it defines the variable name to be the full pathname to the current file; this allows us to test whether or not the current file is really a directory, which we wouldn't want to delete.

Therefore, we loop through all the files, looking for files with the name specified on the comand line (or named core, if no -lookfor option is specified). When we find one we store its statistics, delete the file, and send a trap to the NMS reporting the file's name and other information. We use the variable SPEC to store the specific trap ID. We use two specific IDs: 1535 if the file was deleted successfully and 1536 if we tried to delete the file but couldn't. Again, we wrote the trap code to use either native Perl, Net-SNMP, or OpenView. Uncomment the version of your choice. We pack the trap with three variable bindings, which contain the name of the file, the results of ls -l on the file, and the results of running /bin/file. Together, these give us a fair amount of information about the file we deleted. Note that we had to define object IDs for all three of these variables; furthermore, although we placed these object IDs under 1535, nothing prevents us from using the same objects when we send specific trap 1536.

Now we have a program to delete core files and send traps telling us about what was deleted; the next step is to tell our trap receiver what to do with these incoming traps. Let's assume that we're using OpenView. To inform it about these traps, we have to add two entries to trapd.conf, mapping these traps to events. Here they are:

EVENT foundNDelCore .1.3.6.1.4.1.2789.2500.0.1535 "Status Alarms" Warning
FORMAT Core File Found :$1: File Has Been Deleted - LS :$2: FILE :$3:
SDESC
This event is called when a server using cronjob looks for core 
files and deletes them.

$1 - octetstringascii   - Name of file
$2 - octetstringascii   - ls -l listing on the file
$3 - octetstringascii   - file $name
EDESC
#
#
#
EVENT foundNNotDelCore .1.3.6.1.4.1.2789.2500.0.1536 "Status Alarms" Minor
FORMAT Core File Found :$1: 
File Has Not Been Deleted For Some Reason - LS :$2: FILE :$3:
SDESC
This event is called when a server using cronjob looks for core 
files and then CANNOT delete them for some reason.

$1 - octetstringascii   - Name of file
$2 - octetstringascii   - ls -l listing on the file
$3 - octetstringascii   - file $name
EDESC
#
#
#
For each trap, we have an EVENT statement specifying an event name, the trap's specific ID, the category into which the event will be sorted, and the severity. The FORMAT statement defines a message to be used when we receive the trap; it can be spread over several lines and can use the parameters $1, $2, etc. to refer to the variable bindings that are included in the trap.

Although it would be a good idea, we don't need to add our variable bindings to our private MIB file; trapd.conf contains enough information for OpenView to interpret the contents of the trap.

Here are some sample traps
[67] generated by the throwcore script:

[67]We've removed most of the host and date/time information.

Core File Found :/usr/sap/HQD/DVEBMGS00/work/core: File Has Been \
Deleted - LS :-rw-rw---- 1 hqdadm sapsys 355042304 Apr 27 17:04 \ 
/usr/sap/HQD/DVEBMGS00/work/core: \
FILE :/usr/sap/HQD/DVEBMGS00/work/core: ELF 32-bit MSB core file \
SPARC Version 1, from 'disp+work':

Core File Found :/usr/sap/HQI/DVEBMGS10/work/core: File Has Been \
Deleted - LS :-rw-r--r-- 1 hqiadm sapsys 421499988 Apr 28 14:29 \ 
/usr/sap/HQI/DVEBMGS10/work/core: \
FILE :/usr/sap/HQI/DVEBMGS10/work/core: ELF 32-bit MSB core file \
SPARC Version 1, from 'disp+work':
Here is root's crontab, which runs the throwcore script at specific intervals. Notice that we use the -path switch, which allows us to check the development area every hour:

# Check for core files every night and every hour on special dirs
27 * * * * /opt/local/mib_programs/scripts/throwcore.pl -path /usr/sap
23 2 * * * /opt/local/mib_programs/scripts/throwcore.pl 


Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.