Setting up distributed monitoring in mission critical production environments is a complex task; configuration can be challenging and mistakes costly. Opsview Enterprise edition and Opsview Syncmaster module make deploying an enterprise monitoring system easy and reduce the risks associated with migrating configuration objects from development to production environments. Here’s how: Continue reading »
So you followed the steps in the previous post about enabling SNMP traps on ESX4. Now you probably want to pick those up by something useful. Opsview can be configured to handle the traps quite easily. Just follow the steps below and your server will be listening to those pesky traps. After that, you’ll need to write a couple of service check handlers in Opsview to make sense of the traps. More on that later. This post is just about picking them up. Continue reading »
This post outlines how to get SNMP traps from ESX hosts and monitor them in Opsview. The first part deals with configuring SNMP traps to get them working correctly with ESX hosts, part 2 tells you how to monitor them with Opsview.
The following steps worked on ESX 4.1. Depending on versions you may have different results. For simplicity, I used 10.0.0.1 as IP for my ESX host, and 10.0.0.99 for my SNMP trap handler. Continue reading »
Over the last couple of years we have seen an increase in port-density on server-hardware and currently the quad-nic (4 port network interface card) seems to be the standard. These cards allow for some great features like bonding (on *nix) or nic-teaming (on Windows) where multiple interfaces are bundled together or setup as fail-overs. It also allows you to nicely split your networks into multiple segments like management and production with each network connected over dedicated NICs as shown below…
Freeware IT monitoring tools are used by thousands of organisation worldwide however using them to monitor complex network, server and application installations can be quite a challenge. This blog post takes the basic capabilities of one such tool, Nagios® Core, and shows how you can scale it with Opsview for use in enterprise environments.
Many freeware IT monitoring tools are great but using them to manage complex systems can be a real challenge. It can also be unforgiving on anyone less than expert in configuring the system with mistakes being punished by a complete stop in monitoring activity.
It has been some time since we last talked about SNMP trap handling, but there’s been some major developments. Recall we use the perl module SNMP::Trapinfo to process a incoming trap. We think this works really well, but there was a major piece of functionality our customer wanted:
Complex calculation of whether a trap passes a test
And by complex, we mean complex. Here’s an example trap:
dastardly.altinity.net
10.243.196.251
SNMPv2-MIB::sysUpTime.0 119:2:04:40.34
SNMPv2-MIB::snmpTrapOID.0 CERENT-454-MIB::remoteAlarmIndication
CERENT-454-MIB::cerent454NodeTime.0 20060814114937D
CERENT-454-MIB::cerent454AlarmState.9216.remoteAlarmIndication notAlarmedNonServiceAffecting
CERENT-454-MIB::cerent454AlarmObjectType.9216.remoteAlarmIndication ds1
CERENT-454-MIB::cerent454AlarmObjectIndex.9216.remoteAlarmIndication 9216
CERENT-454-MIB::cerent454AlarmSlotNumber.9216.remoteAlarmIndication 2
CERENT-454-MIB::cerent454AlarmPortNumber.9216.remoteAlarmIndication port36
CERENT-454-MIB::cerent454AlarmLineNumber.9216.remoteAlarmIndication 0
CERENT-454-MIB::cerent454AlarmObjectName.9216.remoteAlarmIndication DS1-2-36-7
SNMP-COMMUNITY-MIB::snmpTrapAddress.0 216.243.196.251
Our customer wanted to be able to say: “Give me a critical alert if cerent454AlarmState.9216.remoteAlarmIndication is not ‘cleared’ and the cerent454AlarmSlotNumber is greater than 5″. Well, this was impossible with our previous setup. I still don’t know why it is called Simple Network Management Protocol…
We sat down to think about this and then realised we probably need an arbitrary way of calculating an SNMP trap, but the last thing we wanted to do was write a syntax parser. That would involve a whole new language, all the parsing work involved, etc, etc. This would take months of work!
Looking for inspiration, we realised OpenNMS has claimed this type of functionality. We downloaded a copy and tried to install it, but hit loads of pre-requisites. We’re very lazy – we should evaluate other technologies, but if it is too much of a pain to install, then we’ll give up right away!
Undeterred, we went for the next best thing – their documentation! Searching around, we found the section on evaluating traps. It appears that OpenNMS have a table called events, which is a list of all the things that happened. Then there are various filters which evaluate against those events to work out whether something needs to be alerted on. SNMP traps are converted into this event format and dropped into that table.
(As an aside, Nagios holds no such processing logic. All that complicated processing is handled by the plugins. Nagios only cares about the result. This is a feature
)
It then dawned on us the beauty part of OpenNMS’ design: rules are expressed as SQL statements.
Let me repeat that again: rules are just SQL statements. If the SQL evaluates to 1, then an alert is raised, otherwise ignored. Fantastic! This does away with all the “design your own syntax” work, with a clear, recognised language! No duplication of work!
So the above requirement could be met with a rule in OpenNMS (we think! We haven’t actually tried this!) that says:
(cerent454AlarmState != 'cleared') & (cerent454AlarmSlotNumber > 5)
which equates to a SQL statement like:
SELECT ipaddr
FROM ipinterface
WHERE ipaddr in (SELECT ipaddr FROM ipinterface, node
WHERE cerent454AlarmState != 'cleared'
AND ipinterface.nodeid =node.nodeid)
AND ipaddr in (SELECT ipaddr FROM ipinterface, snmpInterface
WHERE cerent454AlarmSlotNumber > 5
AND ipinterface.ipaddr = snmpInterface.ipaddr);
But we couldn’t do that with SNMP::Trapinfo – no SQL database. Tacking on DBI.pm support would be terrible. But then it hit us – why not use Perl? Most sysadmins know perl syntax and it would allow useful functionality like regular expressions, which are not as powerful in SQL.
How do we express the SNMP trap variables? Well, we already have that in SNMP::Trapinfo – macros. ${CERENT-454-MIB::cerent454AlarmState.9216.remoteAlarmIndication} evaluates as notAlarmedNonServiceAffecting in the example trap, but instead of making it a line to display, wrap it up in some perl code:
“${CERENT-454-MIB::cerent454AlarmState.9216.remoteAlarmIndication}” eq “cleared”
(These Cerent devices also make it difficult to find a specific variable because it encodes the object index number, 9216, into the oid name. Sigh – no one said SNMP had to be Simple or consistent. To overcome this, we introduced the idea of a wildcard for an OID tuple, so the above could be written as “${CERENT-454-MIB::cerent454AlarmState.*.remoteAlarmIndication}” eq “cleared”. There are some issues if there are multiple OIDs which match this name, but we assume that only one matches…)
There’s a new method in SNMP::Trapinfo called eval. This evaluates the string as a snippet of perl code and gets the return code. There are three possible results that come back from the eval:
- 1 = true – the perl snippet runs and evaluates true
- 0 = false – the perl snipper evaluates as false
- undef = error – the perl code did not run correctly (most likely is syntax errors)
This last case is possible if the variable name does not exist. For instance, the expansion of ‘${CERENT-454-MIB::cerent454AlarmSlotNumber.*.remoteAlarmIndication} > 5′ would convert to ‘ > 5′ which is not valid perl code if the trap coming in did not contain the desired variable.
So our way of expressing the rule required is:
"${cerent454AlarmState.9216.remoteAlarmIndication}" ne "cleared" && cerent454AlarmSlotNumber.9216.remoteAlarmIndication > 5
We have a basic wrapper script that if this code returns as true, we send a passive check to Nagios.
One final thing: we have a front end application to configure the perl snippet of code. This is obviously tainted. We don’t necessarily know what is contained in the code, so it could do things like “system(‘rm -fr $HOME’)”. We added on the Safe module, so now it is restricted to only running specific operators, like the comparison and regexps and mathematical functions. Good security lets us sleep at night
SNMP::Trapinfo is now released on CPAN. We use this for our SNMP trap processing and we think it works fantastically well. And this continues our aim of making the base portions of Opsview as solid as possible.
Continuing our run of useful SNMP OIDs…
One of the most commonly monitored statictics is filesystem usage. Here is how you do it with SNMP. All OIDs listed are available under MIB-II.
Note: <int> is an integer corresponding to the filesystem number. Most systems will have multiple partitions / filesystems.
Description
.1.3.6.1.2.1.25.2.3.1.3.<int>
Description of filesystem. On a Unix system examples would be / or /home. Under Windows expect C:/, D:/ etc/
Capacity
.1.3.6.1.2.1.25.2.3.1.5.<int>
Capacity of filesystem in blocks
Usage
.1.3.6.1.2.1.25.2.3.1.6.<int>
How many blocks are currently being used to store data
Blocksize
.1.3.6.1.2.1.25.2.3.1.4.<int>
Blocksize in bytes. Important because other stats are in blocks.
Maths
So to find capacity of filesystem in bytes you need to multiply the size in blocks with the block size. Same principle applies to calculating how much of the filesystem is in use.
If you want to display values in Kb / Mb / Gb remember to divide by 1024 each time.



Opsview is a leading Open Source application and network monitoring suite. Labs is where our engineers discuss new projects, new approaches and new frameworks they’re using.
Recent Comments