Book HomeEssential SNMPSearch this book

12.4. Veritas Disk Check

The Veritas Volume Manager is a package that allows you to manipulate disks and their partitions. It gives you the ability to add and remove mirrors, work with RAID arrays, and resize partitions, to name a few things. Although Veritas is a specialized and expensive package that is usually found at large data centers, don't assume that you can skip this section. The point isn't to show you how to monitor Veritas, but to show you how you can provide meaningful traps using a typical status program. You should be able to extract the ideas from the script we present here and use them within your own context.

Veritas Volume Manager (vxvm) comes with a utility called vxprint. This program displays records from the Volume Manager configuration and shows the status of each of your local disks. If there is an error, such as a bad disk or broken mirror, this command will report it. A healthy vxprint on the rootvol (/) looks like this:

$ vxprint -h rootvol
Disk group: rootdg

TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
v  rootvol      root         ENABLED  922320   -        ACTIVE   -       -
pl rootvol-01   rootvol      ENABLED  922320   -        ACTIVE   -       -
sd rootdisk-B0  rootvol-01   ENABLED  1        0        -        -       Block0
sd rootdisk-02  rootvol-01   ENABLED  922319   1        -        -       -
pl rootvol-02   rootvol      ENABLED  922320   -        ACTIVE   -       -
sd disk01-01    rootvol-02   ENABLED  922320   0        -        -       -
The KSTATE (kernel state) and STATE columns give us a behind-the-scenes look at our disks, mirrors, etc. Without explaining the output in detail, a KSTATE of ENABLED is a good sign; a STATE of ACTIVE or - indicates that there are no problems. We can take this output and pipe it into a script that sends SNMP traps when errors are encountered. We can send different traps of an appropriate severity, based on the type of error that vxprint reported. Here's a script that runs vxprint and analyzes the results:

#!/usr/local/bin/perl -wc

$VXPRINT_LOC    = "/usr/sbin/vxprint";
$HOSTNAME       = `/bin/uname -n`; chop $HOSTNAME;

while ($ARGV[0] =~ /^-/)
{
    if    ($ARGV[0] eq "-debug")        { shift; $DEBUG = $ARGV[0]; }
    elsif ($ARGV[0] eq "-state_active") { $SHOW_STATE_ACTIVE = 1; }
    shift;
}

####################################################################
###########################  Begin Main  ###########################
####################################################################

&get_vxprint;  # Get it, process it, and send traps if errors found!

####################################################################
########################  Begin SubRoutines  #######################
####################################################################

sub get_vxprint
{

    open(VXPRINT,"$VXPRINT_LOC |") || die "Can't Open $VXPRINT_LOC";
    while($VXLINE=<VXPRINT>)
    {
        print $VXLINE unless ($DEBUG < 2);
        if ($VXLINE ne "\n")
        {
            &is_a_disk_group_name;
            &split_vxprint_output;

            if (($TY ne "TY")   &&
                ($TY ne "Disk") &&
                ($TY ne "dg")   &&
                ($TY ne "dm"))
            {
                if (($SHOW_STATE_ACTIVE) && ($STATE eq "ACTIVE"))
                {
                    print "ACTIVE: $VXLINE";
                }
                if (($STATE ne "ACTIVE") &&
                    ($STATE ne "DISABLED") &&
                    ($STATE ne "SYNC") &&
                    ($STATE ne "CLEAN") &&
                    ($STATE ne "SPARE") &&
                    ($STATE ne "-")      &&
                    ($STATE ne ""))
                {
                    &send_error_msgs;
                }
                elsif (($KSTATE ne "ENABLED") &&
                       ($KSTATE ne "DISABLED") &&
                       ($KSTATE ne "-")       &&
                       ($KSTATE ne ""))
                {
                    &send_error_msgs;
                }
            } # end if (($TY
        }     # end if ($VXLINE
    }         # end while($VXLINE
}             # end sub get_vxprint

sub is_a_disk_group_name
{
    if ($VXLINE =~ /^Disk\sgroup\:\s(\w+)\n/)
    {
        $DISK_GROUP = $1;
        print "Found Disk Group :$1:\n" unless (!($DEBUG));
        return 1;
    }
}

sub split_vxprint_output
{
($TY, $NAME, $ASSOC, $KSTATE,
    $LENGTH, $PLOFFS, $STATE, $TUTIL0,
    $PUTIL0) = split(/\s+/,$VXLINE);
    
    if ($DEBUG) { 
        print "SPLIT: $TY $NAME $ASSOC $KSTATE ";
        print "$LENGTH $PLOFFS $STATE $TUTIL0 $PUTIL0:\n";
            }
}

sub send_snmp_trap
{
    $SNMP_TRAP_LOC          = "/opt/OV/bin/snmptrap";
    $SNMP_COMM_NAME         = "public";
    $SNMP_TRAP_HOST         = "nms";

    $SNMP_ENTERPRISE_ID     = ".1.3.6.1.4.1.2789.2500";
    $SNMP_GEN_TRAP          = "6";
    $SNMP_SPECIFIC_TRAP     = "1000";

    chop($SNMP_TIME_STAMP        = "1" . `date +%H%S`); 
    $SNMP_EVENT_IDENT_ONE   = ".1.3.6.1.4.1.2789.2500.1000.1";
    $SNMP_EVENT_VTYPE_ONE   = "octetstringascii";
    $SNMP_EVENT_VAR_ONE     = "$HOSTNAME";

    $SNMP_EVENT_IDENT_TWO   = ".1.3.6.1.4.1.2789.2500.1000.2";
    $SNMP_EVENT_VTYPE_TWO   = "octetstringascii";
    $SNMP_EVENT_VAR_TWO     = "$NAME";

    $SNMP_EVENT_IDENT_THREE = ".1.3.6.1.4.1.2789.2500.1000.3";
    $SNMP_EVENT_VTYPE_THREE = "octetstringascii";
    $SNMP_EVENT_VAR_THREE   = "$STATE";

    $SNMP_EVENT_IDENT_FOUR  = ".1.3.6.1.4.1.2789.2500.1000.4";
    $SNMP_EVENT_VTYPE_FOUR  = "octetstringascii";
    $SNMP_EVENT_VAR_FOUR    = "$DISK_GROUP";

    $SNMP_TRAP = "$SNMP_TRAP_LOC \-c $SNMP_COMM_NAME $SNMP_TRAP_HOST 
    $SNMP_ENTERPRISE_ID \"\" $SNMP_GEN_TRAP $SNMP_SPECIFIC_TRAP $SNMP_TIME_STAMP
    $SNMP_EVENT_IDENT_ONE   $SNMP_EVENT_VTYPE_ONE   \"$SNMP_EVENT_VAR_ONE\"
    $SNMP_EVENT_IDENT_TWO   $SNMP_EVENT_VTYPE_TWO   \"$SNMP_EVENT_VAR_TWO\"
    $SNMP_EVENT_IDENT_THREE $SNMP_EVENT_VTYPE_THREE \"$SNMP_EVENT_VAR_THREE\"
    $SNMP_EVENT_IDENT_FOUR  $SNMP_EVENT_VTYPE_FOUR  \"$SNMP_EVENT_VAR_FOUR\"";

    # Sending a trap using Net-SNMP
    #
    #system "/usr/local/bin/snmptrap $SNMP_TRAP_HOST $SNMP_COMM_NAME 
    #$SNMP_ENTERPRISE_ID '' $SNMP_GEN_TRAP $SNMP_SPECIFIC_TRAP ''
    #$SNMP_EVENT_IDENT_ONE s \"$SNMP_EVENT_VAR_ONE\" 
    #$SNMP_EVENT_IDENT_TWO s \"$SNMP_EVENT_VAR_TWO\"
    #$SNMP_EVENT_IDENT_THREE s \"$SNMP_EVENT_VAR_THREE\"
    #$SNMP_EVENT_IDENT_FOUR s \"$SNMP_EVENT_VAR_FOUR\"";

    # Sending a trap using Perl
    #
    #use SNMP_util "0.54";  # This will load the BER and SNMP_Session for us
    #snmptrap("$SNMP_COMM_NAME\@$SNMP_TRAP_HOST:162", "$SNMP_ENTERPRISE_ID",
    #mylocalhostname, $SNMP_GEN_TRAP, $SNMP_SPECIFIC_TRAP, 
    #"$SNMP_EVENT_IDENT_ONE", "string", "$SNMP_EVENT_VAR_ONE",
    #"$SNMP_EVENT_IDENT_TWO", "string", "$SNMP_EVENT_VAR_TWO",
    #"$SNMP_EVENT_IDENT_THREE", "string", "$SNMP_EVENT_VAR_THREE",
    #"$SNMP_EVENT_IDENT_FOUR", "string", "$SNMP_EVENT_VAR_FOUR");

    # Sending a trap using OpenView's snmptrap (using VARs from above)
    #
    if($SEND_SNMP_TRAP) {
         print "Problem Running SnmpTrap with Result ";
         print ":$SEND_SNMP_TRAP: :$SNMP_TRAP:\n";
    }

sub send_error_msgs
{
    $TY =~ s/^v/Volume/;
    $TY =~ s/^pl/Plex/;
    $TY =~ s/^sd/SubDisk/;

    print "VXfs Problem: Host:[$HOSTNAME] State:[$STATE] DiskGroup:[$DISK_GROUP] 
        Type:[$TY] FileSystem:[$NAME] Assoc:[$ASSOC] Kstate:[$KSTATE]\n" 
        unless (!($DEBUG));

    &send_snmp_trap;
}
Knowing what the output of vxprint should look like, we can formulate Perl statements that figure out when to generate a trap. That task makes up most of the get_vxprint subroutine. We also know what types of error messages will be produced. Our script tries to ignore all the information from the healthy disks and sort the error messages. For example, if the STATE field contains NEEDSYNC, the disk mirrors are probably not synchronized and the volume needs some sort of attention. The script doesn't handle this particular case explicitly, but it is caught with the default entry.

The actual mechanism for sending the trap is tied up in a large number of variables. Basically, though, we use any of the trap utilities we've discussed; the enterprise ID is .1.3.6.1.4.1.2789.2500 ; the specific trap ID is 1000 ; and we include four variable bindings, which report the hostname, the volume name, the volume's state, and the disk group.

As with the previous script, it's a simple matter to run this script periodically and watch the results on whatever network-management software you're using. It's also easy to see how you could develop similar scripts that generate reports from other status programs.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.