Book HomeEssential SNMPSearch this book

9.2. External Polling

It is often impossible to poll a device internally, for technical, security, or political reasons. For example, the System Administration group may not be in the habit of giving out the root password, making it difficult for you to install and maintain internal polling scripts. However, they may have no problem with installing and maintaining an SNMP agent such as Concord's SystemEDGE or Net-SNMP. It's also possible that you will find yourself in an environment in which you lack the knowledge to build the tools necessary to poll internally. Despite the situation, if an SNMP agent is present on a machine that has objects worth polling, you can use an external device to poll the machine and read the objects' values.[38] This external device can be one or more NMSs or other machines or devices. For instance, when you have a decent-sized network it is sometimes convenient, and possibly necessary, to distribute polling among several NMSs.

[38]Many devices say they are SNMP-compatible but support only a few MIBs. This makes polling nearly impossible. If you don't have the object(s) to poll there is nothing you can do, unless there are hooks for an extensible agent. Even with extensible agents, unless you know how to program, the Simple in SNMP goes away fast.

Each of the external polling engines we will look at uses the same polling methods, although some NMSs implement external polling differently. We'll start with the OpenView xnmgraph program, which can be used to collect and display data graphically. You can even use OpenView to save the data for later retrieval and analysis. We'll include some examples that show how you can collect data and store it automatically and how you can retrieve that data for display. Castle Rock's SNMPc also has an excellent data-collection facility that we will use to collect and graph data.

9.2.1. Collecting and Displaying Data with OpenView

One of the easiest ways to get some interesting graphs with OpenView is to use the xnmgraph program. You can run xnmgraph from the command line and from some of NNM's menus. One practical way to graph is to use OpenView's xnmbrowser to collect some data and then click "Graph." It's as easy as that. If the node you are polling has more than one instance (say, multiple interfaces), OpenView will graph all known instances. When an NMS queries a device such as a router, it determines how many instances are in the ifTable and retrieves management data for each entry in the table.

9.2.2. OpenView Graphing

Figure 9-3 shows the sort of graph you can create with NNM. To create this graph, we started the browser (Figure 8-2) and clicked down through the MIB tree until we found the .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry list. Once there, we clicked on ifInOctets; then, while holding down the Ctrl key, we clicked on ifOutOctets. After both were selected and we verified that the "Name or IP Address" field displayed the node we wanted to poll, we clicked on the "Graph" button.

Figure 9-3

Figure 9-3. OpenView xnmgraph of octets in/out

Once the graph has started, you can change the polling interval and the colors used to display different objects. You can also turn off the display of some or all of the object instances. The menu item "View Figure 9.2.2 Line Configuration" lets you specify which objects you would like to display; it can also set multipliers for different items. For example, to display everything in K, multiply the data by .001. There is also an option ("View Figure 9.2.2 Statistics") that shows a statistical summary of your graph. Figure 9-4 shows some statistics from the graph in Figure 9-3. While the statistics menu is up, you can left-click on the graph; the statistics window will display the values for the specific date and time to which you are pointing with the mouse.

Figure 9-4

Figure 9-4. xnmgraph statistics

WARNING: Starting xnmgraph from the command line allows you to start the grapher at a specific polling period and gives you several other options. By default, OpenView polls at 10-second intervals. In most cases this is fine, but if you are polling a multiport router to check if some ports are congested, a 10-second polling interval may be too quick and could cause operational problems. For example, if the CPU is busy answering SNMP queries every 10 seconds, the router might get bogged down and become very slow, especially if the router is responsible for OSPF or other CPU-intensive tasks. You may also see messages from OpenView complaining that another poll has come along while it is still waiting for the previous poll to return. Increasing the polling interval usually gets rid of these messages.

Some of NNM's default menus let you use the grapher to poll devices depending on their type. For example, you can select the object type "router" on the NNM and generate a graph that includes all your routers. Whether you start from the command line or from the menu, there are times when you will get a message back that reads "Requesting more lines than number of colors (25). Reducing number of lines." This message means that there aren't enough colors available to display the objects you are trying to graph. The only good ways to avoid this problem are to break up your graphs so that they poll fewer objects or to eliminate object instances you don't want. For example, you probably don't want to graph router interfaces that are down (for whatever reason) and other "dead" objects. We will soon see how you can use a regular expression as one of the arguments to the xnmgraph command to graph only those interfaces that are up and running.

Although the graphical interface is very convenient, the command-line interface gives you much more flexibility. The following script displays the graph in Figure 9-3 (i.e., the graph we generated through the browser):

#!/bin/sh
# filename: /opt/OV/local/scripts/graphOctets
# syntax: graphOctets <hostname>
/opt/OV/bin/xnmgraph -c public -mib \
".iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets::::::::,\
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifOutOctets::::::::" \
$1
You can run this script with the command:

$ /opt/OV/local/scripts/graphOctets orarouter1
The worst part of writing the script is figuring out what command-line options you want -- particularly the long strings of nine colon-separated options. All these options give you the ability to refine what you want to graph, how often you want to poll the objects, and how you want to display the data. (We'll discuss the syntax of these options as we go along, but for the complete story, see the xnmgraph(1) manpage.) In this script, we're graphing the values of two MIB objects, ifInOctets and ifOutOctets. Each OID we want to graph is the first (and in this case, the only) option in the string of colon-separated options. On our network, this command produces eight traces: input and output octets for each of our four interfaces. You can add other OIDs to the graph by adding sets of options, but at some point the graph will become too confusing to be useful. It will take some experimenting to use the xnmgraph command efficiently, but once you learn how to generate useful graphs you'll wonder how you ever got along without it.

WARNING: Keeping your scripts neat is not only good practice, but also aesthetically pleasing. Using a "\" at the end of a line indicates that the next line is a continuation of the current line. Breaking your lines intelligently makes your scripts more readable. Be warned that the Unix shells do not like extra whitespace after the "\". The only character after each "\" should be one carriage return.

Now, let's modify the script to include more reasonable labels -- in particular, we'd like the graph to show which interface is which, rather than just showing the index number. In our modified script, we've used numerical object IDs, mostly for formatting convenience, and we've added a sixth option to the ugly sequence of colon-separated options: .1.3.6.1.2.1.2.2.1.2 (this is the ifDescr, or interface description, object in the interface table). This option says to poll each instance and use the return value of snmpget .1.3.6.1.2.1.2.2.1.2.INSTANCE as the label. This should give us meaningful labels. Here's the new script:

#!/bin/sh
# filename: /opt/OV/local/scripts/graphOctets
# syntax: graphOctets <hostname>
/opt/OV/bin/xnmgraph -c public -title Bits_In_n_Out -mib \
".1.3.6.1.4.1.9.2.2.1.1.6:::::.1.3.6.1.2.1.2.2.1.2:::,\
.1.3.6.1.4.1.9.2.2.1.1.8:::::.1.3.6.1.2.1.2.2.1.2:::" $1
To see what we'll get for labels, here's the result of walking .1.3.6.1.2.1.2.2.1.2:

$ snmpwalk orarouter1 .1.3.6.1.2.1.2.2.1.2
interfaces.ifTable.ifEntry.ifDescr.1 : DISPLAY STRING- (ascii):  Ethernet0
interfaces.ifTable.ifEntry.ifDescr.2 : DISPLAY STRING- (ascii):  Serial0
interfaces.ifTable.ifEntry.ifDescr.3 : DISPLAY STRING- (ascii):  Serial1
Figure 9-5 shows our new graph. With the addition of this sixth option, the names and labels are much easier to read.

Figure 9-5

Figure 9-5. OpenView xnmgraph with new labels

Meaningful labels and titles are important, especially if management is interested in seeing the graphs. A label that contains an OID and not a textual description is of no use. Some objects that are useful in building labels are ifType (.1.3.6.1.2.1.2.2.1.3) and ifOperStatus (.1.3.6.1.2.1.2.2.1.8). Be careful when using ifOperStatus; if the status of the interface changes during a poll, the label does not change. The label is evaluated only once.

One of the most wasteful things you can do is poll a useless object. This often happens when an interface is administratively down or not configured. Imagine that you have 20 serial interfaces, but only one is actually in use. If you are looking for octets in and out of your serial interfaces, you'll be polling 40 times and 38 of the polls will always read 0. OpenView's xnmgraph allows you to specify an OID and regular expression to select what should be graphed. To put this feature to use, let's walk the MIB to see what information is available:

$ snmpwalk orarouter1 .1.3.6.1.2.1.2.2.1.8
interfaces.ifTable.ifEntry.ifOperStatus.1 : INTEGER: up
interfaces.ifTable.ifEntry.ifOperStatus.2 : INTEGER: up
interfaces.ifTable.ifEntry.ifOperStatus.3 : INTEGER: down
This tells us that only two interfaces are currently up. By looking at ifDescr, we see that the live interfaces are Ethernet0 and Serial0; Serial1 is down. Notice that the type of ifOperStatus is INTEGER, but the return value looks like a string. How is this? RFC 1213 defines string values for each possible return value:

ifOperStatus OBJECT-TYPE
    SYNTAX  INTEGER {
                up(1),       -- ready to pass packets
                down(2),
                testing(3)   -- in some test mode
            }
    ACCESS  read-only
    STATUS  mandatory
    DESCRIPTION
        "The current operational state of the interface. The testing(3)
         state indicates that no operational packets can be passed."
    ::= { ifEntry 8 }
It's fairly obvious how to read this: the integer value 1 is converted to the string up. We can therefore use the value 1 in a regular expression that tests ifOperStatus. For every instance we will check the value of ifOperStatus; we will poll that instance and graph the result only if the status returns 1. In pseudocode, the operation would look something like this:

if (ifOperStatus == 1) {
    pollForMIBData;
    graphOctets;
}
Here's the next version of our graphing script. To put this logic into a graph, we use the OID for ifOperStatus as the fourth colon option, and the regular expression (1) as the fifth option:

#!/bin/sh
# filename: /opt/OV/local/scripts/graphOctets
# syntax: graphOctets <hostname>
/opt/OV/bin/xnmgraph -c public \
-title Octets_In_and_Out_For_All_Up_Interfaces \
-mib ".1.3.6.1.2.1.2.2.1.10:::.1.3.6.1.2.1.2.2.1.8:1::::, \
.1.3.6.1.2.1.2.2.1.16:::.1.3.6.1.2.1.2.2.1.8:1::::" $1
This command graphs the ifInOctets and ifOutOctets of any interface that has a current operational state equal to 1, or up. It therefore polls and graphs only the ports that are important, saving on network bandwidth and simplifying the graph. Furthermore, we're less likely to run out of colors while making the graph because we won't assign them to useless objects. Note, however, that this selection happens only during the first poll and stays effective throughout the entire life of the graphing process. If the status of any interface changes after the graph has been started, nothing in the graph will change. The only way to discover any changes in interface status is to restart xnmgraph.

Finally, let's look at:

The cropped graph in
Figure 9-6 shows how the labels change when we run the following script:

#!/bin/sh
# filename: /opt/OV/local/scripts/graphOctets
# syntax: graphOctets <hostname>
/opt/OV/bin/xnmgraph -c public -title Internet_Traffic_In_K -poll 68 -mib \
".1.3.6.1.4.1.9.2.2.1.1.6:Incoming_Traffic::::.1.3.6.1.2.1.2.2.1.2::.001:,\
.1.3.6.1.4.1.9.2.2.1.1.8:Outgoing_Traffic::::.1.3.6.1.2.1.2.2.1.2::.001:" \
$1
The labels are given by the second and sixth fields in the colon-separated options (the second field provides a textual label to identify the objects we're graphing and the sixth uses the ifDescr field to identify the particular interface); the constant multiplier (.001) is given by the eighth field; and the polling interval (in seconds) is given by the -poll option.

Figure 9-6

Figure 9-6. xnmgraph with labels and multipliers

By now it should be apparent how flexible OpenView's xnmgraph program really is. These graphs can be important tools for troubleshooting your network. When a network manager receives complaints from customers regarding slow connections, he can look at the graph of ifInOctets generated by xnmgraph to see if any router interfaces have unusually high traffic spikes.

Graphs like these are also useful when you're setting thresholds for alarms and other kinds of traps. The last thing you want is a threshold that is too triggery (one that goes off too many times) or a threshold that won't go off until the entire building burns to the ground. It's often useful to look at a few graphs to get a feel for your network's behavior before you start setting any thresholds. These graphs will give you a baseline from which to work. For example, say you want to be notified when the battery on your UPS is low (which means it is being used) and when it is back to normal (fully charged). The obvious way to implement this is to generate an alarm when the battery falls below some percentage of full charge, and another alarm when it returns to full charge. So the question is: what value can we set for the threshold? Should we use 10% to indicate that the battery is being used and 100% to indicate that it's back to normal? We can find the baseline by graphing the device's MIBs.[39] For example, with a few days' worth of graphs, we can see that our UPS's battery stays right around 94-97% when it is not in use. There was a brief period when the battery dropped down to 89%, when it was performing a self-test. Based on these numbers, we may want to set the "in use" threshold at 85% and the "back to normal" threshold at 94%. This pair of thresholds gives us plenty of notification when the battery's in use, but won't generate useless alarms when the device is in self-test mode. The appropriate thresholds depend on the type of devices you are polling, as well as the MIB data that is gathered. Doing some initial testing and polling to get a baseline (normal numbers) will help you set thresholds that are meaningful and useful.

[39]Different vendors have different UPS MIBs. Refer to your particular vendor's MIB to find out which object represents low battery power.

Before leaving xnmgraph, we'll take a final look at the nastiest aspect of this program: the sequence of nine colon-separated options. In the examples, we've demonstrated the most useful combinations of options. In more detail, here's the syntax of the graph specification:

object:label:instances:match:expression:instance-label:truncator:multiplier:nodes
The parameters are:

object
The OID of the object whose values you want to graph. This can be in either numeric or human-readable form, but it should not have an instance number at the end. It can also be the name of an expression (expressions are discussed in Appendix A, "Using Input and Output Octets").

label
A string to use in making the label for all instances of this object. This can be a literal string or the OID of some object with a string value. The label used on the graph is made by combining this label (for all instances of the object) with instance-label, which identifies individual instances of an object in a table. For example, in Figure 9-6, the labels are Incoming_Traffic and Outgoing_Traffic; instance-label is 1.3.6.1.2.1.2.2.1.2, or the ifDescr field for each object being graphed.

instances
A regular expression that specifies which instances of object to graph. If this is omitted, all instances are graphed. For example, the regular expression 1 limits the graph to instance 1 of object; the regular expression [4-7] limits the graph to instances 4 through 7. You can use the match and expression fields to further specify which objects to match.

match
The OID of an object (not including the instance ID) to match against a regular expression (the match-expression), to determine which instances of the object to display in the graph.

expression
A regular expression; for each instance, the object given by match is compared to this regular expression. If the two match, the instance is graphed.

instance-label
A label to use to identify particular instances of the object you are graphing. This is used in combination with the label and truncator fields to create a label for each instance of each object being graphed.

truncator
A string that will be removed from the initial portion of the instance label, to make it shorter.

multiplier
A number that's used to scale the values being graphed.

nodes
The nodes to poll to create the graph. You can list any number of nodes, separated by spaces. The wildcard "*" polls all the nodes in OpenView's database. If you omit this field, xnmgraph takes the list of nodes from the final argument on the command line.

The only required field is object; however, as we've seen, you must have all eight colons even if you leave some (or most) of the fields empty.

9.2.3. OpenView Data Collection and Thresholds

Once you close the OpenView graphs, the data in them is lost forever. OpenView provides a way to fix this problem with data collection. Data collection allows the user to poll and record data continuously. It can also look at these results and trigger events. One benefit of data collection is that it can watch the network for you while you're not there; you can start collecting data on Friday then leave for the weekend knowing that any important events will be recorded in your absence.

You can start OpenView's Data Collection and Thresholds function from the command line, using the command $OV_BIN/xnmcollect, or from NNM under the Options menu. This brings you to the "Data Collection and Thresholds" window, shown in
Figure 9-7, which displays a list of all the collections you have configured and a summary of the collection parameters.

Figure 9-7

Figure 9-7. OpenView's Data Collection and Thresholds window

Configured collections that are in "Suspended" mode appear in a dark or bold font. This indicates that OpenView is not collecting any data for these objects. A "Collecting" status indicates that OpenView is polling the selected nodes for the given object and saving the data. To change the status of a collection, select the object, click on "Actions," and then click on either "Suspend Collection" or "Resume Collection." (Note that you must save your changes before they will take effect.)

9.2.3.1. Designing collections

To design a new collection, click on "Edit Figure 9.2.3.1 Add MIB Object." This takes you to a new screen. At the top, click on "MIB Object"[40] and click down through the tree until you find the object you would like to poll. To look at the status of our printer's paper tray, for example, we need to navigate down to .iso.org.dod.internet.private.enterprises.hp.nm.system.net-peripheral.net-printer.generalDeviceStatus.gdStatusEntry.gdStatusPaperOut (.1.3.6.1.4.1.11.2.3.9.1.1.2.8).[41] The object's description suggests that this is the item we want: it reads "This indicates that the peripheral is out of paper." (If you already know what you're looking for, you can enter the name or OID directly.) Once there, you can change the name of the collection to something that is easier to read. Click "OK" to move forward. This brings you to the menu shown in Figure 9-8.

[40]You can collect the value of an expression instead of a single MIB object. The topic of expressions is out of the scope of this book but is explained in the mibExpr.conf (4) manpage.

[41]This object is in HP's private MIB, so it won't be available unless you have HP printers and have installed the appropriate MIBs. Note that there is a standard printer MIB, RFC 1759, but HP's MIB has more useful information.

Figure 9-8

Figure 9-8. OpenView poll configuration menu

The "Source" field is where you specify the nodes from which you would like to collect data. Enter the hostnames or IP addresses you want to poll. You can use wildcards like 198.27.6.* in your IP addresses; you can also click "Add Map" to add any nodes currently selected. We suggest that you start with one node for testing purposes. Adding more nodes to a collection is easy once you have everything set up correctly; you just return to the window in Figure 9-8 and add the nodes to the Source list.

"Collection Mode" lets you specify what to do with the data NNM collects. There are four collection modes: "Exclude Collection," "Store, Check Thresholds," "Store, No Thresholds," and "Don't Store, Check Thresholds." Except for "Exclude Collection," which allows us to turn off individual collections for each device, the collection modes are fairly self-explanatory. ("Exclude Collection" may sound odd, but it is very useful if you want to exclude some devices from collection without stopping the entire process; for example, you may have a router with a hardware problem that is bombarding you with meaningless data.) Data collection without a threshold is easier than collection with a threshold, so we'll start there. Set the Collection Mode to "Store, No Thresholds." This disable (grays out) the bottom part of the menu, which is used for threshold parameters. (Select "Store, Check Thresholds" if you want both data collection and threshold monitoring.) Then click "OK" and save the new collection. You can now watch your collection grow in the $OV_DB/snmpCollect directory. Each collection consists of a binary datafile, plus a file with the same name preceded by an exclamation mark (!); this file stores the collection information. The data-collection files will grow without bounds. To trim these files without disturbing the collector, delete all files that do not contain an "!" mark.

Clicking on "Only Collect on Nodes with sysObjectID:" allows you to enter a value for sysObjectID. sysObjectID (iso.org.dod.internet.mgmt.mib-2.system.sysObjectID) lets you limit polling to devices made by a specific manufacturer. Its value is the enterprise number the device's manufacturer has registered with IANA. For example, Cisco's enterprise number is 9, and HP's is 11 (the complete list is available at http://www.isi.edu/in-notes/iana/assignments/enterprise-numbers); therefore, to restrict polling to devices manufactured by HP, set the sysObjectID to 11. RFC 1213 formally defines sysObjectID (1.3.6.1.2.1.1.2) as follows:

sysObjectID OBJECT-TYPE
    SYNTAX  OBJECT IDENTIFIER
    ACCESS  read-only
    STATUS  mandatory
    DESCRIPTION
        "The vendor's authoritative identification of the network
         management subsystem contained in the entity. This value
         is allocated within the SMI enterprises subtree (1.3.6.1.4.1) 
         and provides an easy and unambiguous means for determining
         what kind of box' is being managed. For example, if vendor
         'Flintstones, Inc.' was assigned the subtree 1.3.6.1.4.1.4242,
         it could assign the identifier 1.3.6.1.4.1.4242.1.1 to its
         'Fred Router'."
    ::= { system 2 }
The polling interval is the period at which polling occurs. You can use one-letter abbreviations to specify units: "s" for seconds, "m" for minutes, "h" for hours, "d" for days. For example, 32s indicates 32 seconds; 1.5d indicates one and a half days. When I'm designing a data collection, I usually start with a very short polling interval -- typically 7s (7 seconds between each poll). You probably wouldn't want to use a polling interval this short in practice (all the data you collect is going to have to be stored somewhere), but when you're setting up a collection, it's often convenient to use a short polling interval. You don't want to wait a long time to find out whether you're collecting the right data.

The next option is a drop-down menu that specifies what instances should be polled. The options are "All," "From List," and "From Regular Expression." In this case we're polling a scalar item, so we don't have to worry about instances; we can leave the setting to "All" or select "From List" and specify instance "0" (the instance number for all scalar objects). If you're polling a tabular object, you can either specify a comma-separated list of instances or choose the "From Regular Expression" option and write a regular expression that selects the instances you want. Save your changes ("File Figure 9.2.3.1 Save"), and you're done.

9.2.3.2. Creating a threshold

Once you've set all this up, you've configured NNM to periodically collect the status of your printer's paper tray. Now for something more interesting: let's use thresholds to generate some sort of notification when the traffic coming in through one of our network interfaces exceeds a certain level. To do this, we'll look at a Cisco-specific object, locIfInBitsSec (more formally iso.org.dod.internet.private.enterprises.cisco.local.linterfaces.lifTable.lifEntry.locIfInBitsSec), whose value is the five-minute average of the rate at which data arrives at the interface, in bits per second. (There's a corresponding object called locIfOutBitsSec, which measures the data leaving the interface.) The first part of the process should be familiar: start Data Collection and Thresholds by going to the Options menu of NNM; then click on "Edit Figure 9.2.3.2 Add MIB Object." Navigate through the object tree until you get to locIfInBitsSec; click "OK" to get back to the screen shown in Figure 9-8. Specify the IP addresses of the interfaces you want to monitor and set the collection mode to "Store, Check Thresholds"; this allows you to retrieve and view the data at a later time. (I typically turn on the "Store" function so I can verify that the collector is actually working and view any data that has accumulated.) Pick a reasonable polling interval -- again, when you're testing it's reasonable to use a short interval -- then choose which instances you'd like to poll, and you're ready to set thresholds.

The "Threshold" field lets you specify the point at which the value you're monitoring becomes interesting. What "interesting" means is up to you. In this case, let's assume that we're monitoring a T1 connection, with a capacity of 1.544 Mbits/second. Let's say somewhat arbitrarily that we'll start worrying when the incoming traffic exceeds 75% of our capacity. So, after multiplying, we set the threshold to "> 1158000". Of course, network traffic is fundamentally bursty, so we won't worry about a single peak -- but if we have two or three consecutive readings that exceed the threshold, we want to be notified. So let's set "consecutive samples" to 3: that shields us from getting unwanted notifications, while providing ample notification if something goes wrong.

Setting an appropriate consecutive samples value will make your life much more pleasant, though picking the right value is something of an art. Another example is monitoring the /tmp partition of a Unix system. In this case, you may want to set the threshold to ">= 85", the number of consecutive samples to 2, and the poll interval to 5m. This will generate an event when the usage on /tmp exceeds 85% for two consecutive polls. This choice of settings means that you won't get a false alarm if a user copies a large file to /tmp and then deletes the file a few minutes later. If you set consecutive samples to 1, NNM will generate a Threshold event as soon as it notices that /tmp is filling up, even if the condition is only temporary and nothing to be concerned about. It will then generate a Rearm event after the user deletes the file. Since we are really only worried about /tmp filling up and staying full, setting the consecutive threshold to 2 can help reduce the number of false alarms. This is generally a good starting value for consecutive samples, unless your polling interval is very high.

The rearm parameters let us specify when everything is back to normal or is, at the very least, starting to return to normal. This state must occur before another threshold is met. You can specify either an absolute value or a percentage. When monitoring the packets arriving at an interface, you might want to set the rearm threshold to something like 926,400 bits per second (an absolute value that happens to be 60% of the total capacity) or 80% of the threshold (also 60% of capacity). Likewise, if you're generating an alarm when /tmp exceeds 85% of capacity, you might want to rearm when the free space returns to 80% of your 85% threshold (68% of capacity). You can also specify the number of consecutive samples that need to fall below the rearm point before NNM will consider the rearm condition met.

The final option, "Configure Threshold Event," asks what OpenView events you would like to execute for each state. You can leave the default event, or you can refer to Chapter 10, "Traps" for more on how to configure events. The "Threshold" state needs a specific event number that must reside in the HP enterprise. The default Threshold event is OV_DataCollectThresh - 58720263. Note that the Threshold event is always an odd number. The Rearm event is the next number after the Threshold event: in this case, 58720264. To configure events other than the default, click on "Configure Threshold Event" and, when the new menu comes up, add one event (with an odd number) to the HP section and a second event for the corresponding Rearm. After making the additions, save and return to the Collection windows to enter the new number.

When you finish configuring the data collection, click "OK." This brings you back to the Data Collection and Thresholds menu. Click "File Figure 9.2.3.2 Save" to make your current additions active. On the bottom half of the "MIB Object Collection Summary" window, click on your new object and then on "Actions Figure 9.2.3.2 Test SNMP." This brings up a window showing the results of an SNMP test on that collection. After the test, wait long enough for your polling interval to have expired once or twice. Then click on the object collection again, but this time click on "Actions Figure 9.2.3.2 Show Data." This window shows the data that has been gathered so far. Try blasting data through the interface to see if you can trigger a Threshold event. If the Threshold events are not occurring, verify that your threshold and polling intervals are set correctly. After you've seen a Threshold event occur, watch how the Rearm event gets executed. When you're finished testing, go back and set up realistic polling periods, add any additional nodes you would like to poll, and turn off storing if you don't want to collect data for trend analysis. Refer to the $OV_LOG/snmpCol.trace file if you are having any problems getting your data collection rolling. Your HP OpenView manual should describe how to use this trace file to troubleshoot most problems.

Once you have collected some data, you can use xnmgraph to display it. The xnmgraph command to use is similar to the ones we saw earlier; it's an awkward command that you'll want to save in a script. In the following script, the -browse option points the grapher at the stored data:

#!/bin/sh
# filename: /opt/OV/local/scripts/graphSavedData
# syntax: graphSavedData <hostname>
/opt/OV/bin/xnmgraph -c public -title Bits_In_n_Out_For_All_Up_Interfaces \
-browse -mib \
 ".1.3.6.1.4.1.9.2.2.1.1.6:::.1.3.6.1.2.1.2.2.1.8:1:.1.3.6.1.2.1.2.2.1.2:::,\
.1.3.6.1.4.1.9.2.2.1.1.8:::.1.3.6.1.2.1.2.2.1.8:1:.1.3.6.1.2.1.2.2.1.2:::" \ 
$1
Once the graph has started, no real (live) data will be graphed; the display is limited to the data that has been collected. You can click on "File Figure 9.2.3.2 Update Data" to check for and insert any data that has been gathered since the start of the graph. Another option is to leave off -browse, which allows the graph to continue collecting and displaying the live data along with the collected data.

Finally, to graph all the data that has been collected for a specific node, go to NNM and select the node you would like to investigate. Then select "Performance Figure 9.2.3.2 Graph SNMP Data Figure 9.2.3.2 Select Nodes" from the menus. You will get a graph of all the data that has been collected for the node you selected. Alternately, select the "All" option in "Performance Figure 9.2.3.2 Graph SNMP Data." With the number of colors limited to 25, you will usually find that you can't fit everything into one graph.

9.2.4. Castle Rock's SNMPc

The workgroup edition of Castle Rock's SNMPc program has similar capabilities to the OpenView package. It uses the term "trend reporting" for its data collection and threshold facilities. The enterprise edition of SNMPc even allows you to export data to a web page. In all our examples we use the workgroup edition of SNMPc.

To see how SNMPc works, let's graph the snmpOutPkts object. This object's OID is 1.3.6.1.2.1.11.2 (iso.org.dod.internet.mgmt.mib-2.snmp.snmpOutPkts). It is defined in RFC 1213 as follows:

snmpOutPkts OBJECT-TYPE
    SYNTAX  Counter
    ACCESS  read-only
    STATUS  mandatory
    DESCRIPTION
        "The total number of SNMP messages which were passed from
         the SNMP protocol entity to the transport service."
    ::= { snmp 2 }
We'll use the orahub device for this example. Start by clicking on the MIB Database selection tab shown in
Figure 9-9; this is the tab at the bottom of the screen that looks something like a spreadsheet -- it's the second from the left. Click down the tree until you come to iso.org.dod.internet.mgmt.mib-2.snmp. Click on the object you would like to graph (for this example, snmpOutPkts). You can select multiple objects with the Ctrl key.

Figure 9-9

Figure 9-9. SNMPc MIB Database view

TIP: SNMPc has a nonstandard way of organizing MIB information. To get to the snmpOutPkts object, you need to click down through the following: "Snmp MIBs Figure 9.2.4 mgmt Figure 9.2.4 snmp Figure 9.2.4 snmpInfo." Though this is quicker than the RFC-based organization used by most products, it does get a little confusing, particularly if you work with several products.

Once you have selected the appropriate MIB object, return to the top level of your map by either selecting the house icon or clicking on the Root Subnet tab (at the far left) to select the device you would like to poll. Instead of finding and clicking on the device, you can enter in the device's name by hand. If you have previously polled the device, you can select it from the drop-down box.
Figure 9-10 shows what a completed menu bar should look like.

Figure 9-10

Figure 9-10. SNMPc menu bar graph section

To begin graphing, click the button with the small jagged graph (the third from the right). Another window will appear displaying the graph (Figure 9-11). The controls at the top change the type of graph (line, bar, pie, distribution, etc.) and the polling interval and allow you to view historical data (the horizontal slider bar). Review the documentation on how each of these work or, better yet, play around to learn these menus even faster.

Figure 9-11

Figure 9-11. SNMPc snmpOutPkts graph section

Once you have a collection of frequently used graphs, you can insert them into the custom menus. Let's insert a menu item in the Tools menu that displays all the information in the snmpInfo table as a pie chart. Click on the Custom Menus tab (the last one), right-click on the Tools folder, and then left-click on "Insert Menu". This gets you to the "Add Custom Menu" window (Figure 9-12). Enter a menu name and select "Pie" for the display type. Use the browse button (>>) to click down the tree of MIB objects until you reach the snmpInfo table; then click "OK." Back at "Add Custom Menu," use the checkboxes in the "Use Selected Object" section to specify the types of nodes that will be able to respond to this custom menu item. For example, to chart snmpInfo a device obviously needs to support SNMP, so we've checked the "Has SNMP" box. This information is used when you (or some other user) try to generate this chart for a given device. If the device doesn't support the necessary protocols, the menu entry for the pie chart will be disabled.

Figure 9-12

Figure 9-12. SNMPc Add Custom Menu window

Click "OK" and proceed to your map to find a device to test. Any SNMP-compatible device should suffice. Once you have selected a device, click on "Tools" and then "Show Pie Chart of snmpInfo." You should see a pie chart displaying the data collected from the MIB objects you have configured. (If the device doesn't support SNMP, this option will be disabled.) Alternately, you could have double-clicked your new menu item in the Custom Menu tab.

SNMPc has a threshold system called Automatic Alarms that can track the value of an object over time to determine its highs and lows (peaks and troughs) and get a baseline. After it obtains the baseline, it alerts you if something strays out of bounds. In the main menu, clicking on "Config Figure 9.2.4 Trend Reports" brings up the menu shown in Figure 9-13.

Figure 9-13

Figure 9-13. SNMPc Trend Reports Global Settings menu

Check the "Enable Automatic Alarms" box to enable this feature. The "Limit Alarms For" box lets you specify how much time must pass before you can receive another alarm of the same nature. This prevents you from being flooded by the same message over and over again. The next section, "Baseline Creation," lets you configure how the baseline will be learned. The learning period is how long SNMPc should take to figure out what the baseline really is. The "Expand After" option, if checked, states how many alarms you can get in one day before SNMPc increases the baseline parameters. In Figure 9-13, if we were to get four alarms in one day, SNMPc would increase the threshold to prevent these messages from being generated so frequently. Checking the "Reduce On No Alarms In One Week" box tells SNMPc to reduce the baseline if we don't receive any alarms in one week. This option prevents the baseline from being set so high that we never receive any alarms. If you check the last option and click "OK," SNMPc will restart the learning process. This gives you a way to wipe the slate clean and start over.

9.2.5. Open Source Tools for Data Collection and Graphing

One of the most powerful tools for data collection and graphing is MRTG, familiar to many in the open source community. It collects statistics and generates graphical reports in the form of web pages. In many respects, it's a different kind of animal than the tools discussed in this chapter. We cover MRTG in Chapter 13, "MRTG".



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.