The first part of a series of posts on Grails and Hudson leading up to a presentation at the London Groovy & Grails User Group. Subsequent instalments will include testing (unit, integration, functional), test coverage, automatic war deployment and monitoring Hudson with Opsview Enterprise.
The Nagios Plugins project recently released a new version. Amongst the changes is a new feature which we added for a customer. The requirement was to measure the rate of change for SNMP counters. The standard check_snmp plugin is great at getting information, but only at a specific moment in time. For some things, you want to check and alert on the rate of change. There’s a lot of interesting metrics that you can get from SNMP which are Counter32 or Counter64 values. An example is IP-MIB::ipInAddrErrors.0. This counts the number of packets that are discarded due to invalid IP addresses – probably due to network errors or an attempt at infiltrating your network.
All enterprises depend on reliable servers, network devices and business applications. Any downtime hits your bottom line. To ensure maximum IT performance across your business, you have to identify and resolve problems before they impact the user experience or security of your data.
In a standard Nagios plus database implementation, you use NDOutils to store information in a database. While we think NDOutils is fantastic, there are some major limitations with it as you monitor more hosts. With Opsview, we want to scale. We’ve already done lots of work with NDOutils, including adding view-like helper tables, updating the database asynchronously, improved indices and speeding up the time to load the configuration at a Nagios reload. Now we want to share an amazing improvement we’ve discovered.
In the light of the recent events at a BT network centre in Paddington (London, UK), where a series of compound failures caused a massive outage with huge knock-on effects, I’m sure many businesses are taking another look at their own (and their suppliers) availability with a view to beefing up business continuity.
Continue reading »
We regularly get customer requests to write snmp checks for their devices but a problem we have is we don’t necessarily get access to those devices – in the past we have used output from an snmpwalk command and written the check scripts as best we can, and then shipped the new check to the customer for testing.
Continue reading »
The great thing about open source software is the unexpected contributions you may suddenly receive. Jonathan Kamens of Advent Software sent a patch into the nagios-devel mailing list about a speed up to the status CGIs. He identified, using gprof, that status.cgi was taking the most time in the sorting routine of downtimes and comments.
Continue reading »
This Monday morning, we got lots of calls from our users where Opsview slave systems running Nagios were raising freshness alerts because checks weren’t being run within their specified period.
Continue reading »
At the heart of Opsview is the Nagios monitoring engine. One of the policies we have with Opsview is to keep the number of changes of our dependent software as low as possible. We do this by keeping track of all the patches we apply and pushing these back upstream (though recently we haven’t had as much time as we’d like…).
Continue reading »

Opsview is a leading Open Source application and network monitoring suite. Labs is where our engineers discuss new projects, new approaches and new frameworks they’re using.



Recent Comments