Background
In a network with NFS, a hanging mount can cause the entire snmpd
to hang. If
using SNMP for service monitoring, this can result in potentially misleading or
excess alerts.
To limit the scope of the damage, it might be useful to run multiple instances
of snmpd
– one as the master, which provides most of the MIBs, and one or
more AgentX subagents, which handle the potentially hang-prone branch of the
MIBs. This is done by enabling and disabling the various modules snmpd
has
available. For example, let’s move the dskTable
and diskIOTable
under the
ucdavis private enterprise subtree to a subagent.
Baseline
To get started, ensure snmpd.conf
has at least one disk
directive, restart
snmpd
, and view the dskTable
:
$ snmptable -Of localhost dskTable
SNMP table: .iso.org.dod.internet.private.enterprises.ucdavis.dskTable
dskIndex dskPath dskDevice dskMinimum dskMinPercent dskTotal dskAvail dskUsed dskPercent dskPercentNode dskTotalLow dskTotalHigh dskAvailLow dskAvailHigh dskUsedLow dskUsedHigh dskErrorFlag dskErrorMsg
1 / /dev/sda3 10000 -1 7611636 206856 7018052 97 17 7611636 0 206820 0 7018052 0 0
Setting up the master
First of all, the AgentX needs to be enabled in the master. Add to snmpd.conf
:
master agentx
Then, the command-line for the master snmpd
needs to be changed to disable
loading the modules. On RHEL5 for example, edit /etc/sysconfig/snmpd.options
and add -I -disk,diskio
to OPTIONS
and restart:
OPTIONS="-Lsd -Lf /dev/null -p /var/run/snmpd.pid -a -I -disk,diskio"
Now to check we’ve turned it off:
$ snmptable -Of localhost dskTable
.iso.org.dod.internet.private.enterprises.ucdavis.dskTable: No entries
Setting up the subagent
Now to start the AgentX subagent, add -X
to run as an AgentX subprocess and
load the modules that had been disabled in the master with nearly the same -I
option: -I disk,diskio
.
To confirm that it is working, we will run keep snmpd
attached to the terminal
with -f
and symbolically print SNMP transactions with -V
:
snmpd -V -f -Le -Lf /dev/null -p /var/run/snmpdiskd.pid -a -X -I disk,diskio
Now let’s look at the table (on a different terminal, of course):
$ snmptable -Of localhost dskTable
SNMP table: .iso.org.dod.internet.private.enterprises.ucdavis.dskTable
dskIndex dskPath dskDevice dskMinimum dskMinPercent dskTotal dskAvail dskUsed dskPercent dskPercentNode dskTotalLow dskTotalHigh dskAvailLow dskAvailHigh dskUsedLow dskUsedHigh dskErrorFlag dskErrorMsg
1 / /dev/sda3 10000 -1 7611636 206856 7018052 97 17 7611636 0 206820 0 7018052 0 0
And check back where the snmpdiskd
subagent is running, we should see output like:
NET-SNMP version 5.3.2.2 AgentX subagent connected
NET-SNMP version 5.3.2.2
Received SNMP packet(s) from callback: 1 on fd 4
GETNEXT message
-- .iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskIndex
Received SNMP packet(s) from callback: 1 on fd 4
GETNEXT message
-- .iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskIndex.1
…
Confirming the failure case
This all works under normal circumstances, but our goal is to make snmpd
more
robust in the face of failures – How do we know that the master won’t simply
hang if the subagent does too? We need to test a running but unresponsive
subagent; fortunately this is easy with snmpd
still attached to the terminal
– we can hit CTRL-Z to suspend the process with a SIGSTOP
.
Before we break the subagent, however, let’s first walk the ucdavis subtree and confirm that both our disk-related information and the other information appear together as expected:
$ snmpwalk -Of localhost ucdavis |less
.iso.org.dod.internet.private.enterprises.ucdavis.memory.memIndex.0 = INTEGER: 0
…
.iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskIndex.1 = INTEGER: 1
.iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskPath.1 = STRING: /
…
.iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskDevice.1 = STRING: /dev/sda3
…
.iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskErrorMsg.1 = STRING:
.iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laIndex.1 = INTEGER: 1
And let’s check getting a particular entry:
$ snmpget -Of localhost ucdavis.dskTable.dskEntry.dskPath.1
.iso.org.dod.internet.private.enterprises.ucdavis.dskTable.dskEntry.dskPath.1 = STRING: /
Now we suspend the snmpdiskd
with CTRL-Z:
[1]+ Stopped snmpd -n snmpdiskd -V -f -Le -Lf /dev/null -p /var/run/snmpdiskd.pid -a -X -I disk,diskio
And now let’s check getting a particular entry:
$ snmpget localhost ucdavis.dskTable.dskEntry.dskPath.1
Timeout: No Response from localhost.
That returns after a brief time-out. Now let’s check walking the ucdavis subtree:
.iso.org.dod.internet.private.enterprises.ucdavis.memory.memIndex.0 = INTEGER: 0
...
.iso.org.dod.internet.private.enterprises.ucdavis.memory.memSwapErrorMsg.0 = STRING:
.iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laIndex.1 = INTEGER: 1
.iso.org.dod.internet.private.enterprises.ucdavis.laTable.laEntry.laIndex.2 = INTEGER: 2
…
Notice now no dskTable
and the other ucdavis subtrees worked just fine!
Splitting up configuration
So far, so good – it’s finishings from here on out.
First of all, you might notice in your syslog events like the following:
/etc/snmp/snmpd.conf: line xxx: Warning: Unknown token: disk.
This is because without the disk
module, the disk
directive is unknown; the
subagent will print many more complaints, since it knows even less than the
master. The solution is to move the disk
directives to a new config file
/etc/snmp/snmpdiskd.conf
and set “snmpdiskd” as the alternative application
name with -n
, which causes snmpd
to look for snmpdiskd.conf
in the 8 or
so places it looks for configuration files (see snmp_config(5) for details):
# snmpd -n snmpdiskd -V -f -Le -Lf /dev/null -p /var/run/snmpdiskd.pid -a -X -I disk,d
Init script
The final task, left as an exercise for the reader, is to create an init script to start up the subagent with the desired parameters.
Other MIBs
It is also possible to move data from the host resources MIB to the subagent.
It does not, however, appear to be possible to move only the storage and
filesystem tables, so all of the managed host resourced have to move. Doing so
is simply a matter of finding the right modules – Run snmpd -Dmib_init -H
2>&1 |grep ^mib_init:|sort
and add all of the hr_*
, hrh_*
and hw_*
modules to the exclusion and inclusion lists.
Other good candidates for moving to a subagent are the modules providing
various directives for running external commands for output, such as extend
,
pass
, and exec
.
Posted: 28 Sep 2013