Dec 31

The great thing about open source software is the unexpected contributions you may suddenly receive. Jonathan Kamens of Advent Software sent a patch into the nagios-devel mailing list about a speed up to the status CGIs. He identified, using gprof, that status.cgi was taking the most time in the sorting routine of downtimes and comments.
Continue reading »

Tagged with:
Jul 07

We had a request to integrate RSS feeds into Opsview. People were getting fed up with their amount of emails! Fair enough (I get hundreds of emails a day), but the idea was also that you could use a mobile phone to see the trail of alerts too.

We looked at the current offerings for RSS at Nagios Exchange, but weren’t too keen on them. Ssugar’s RSS notifications works by using a notification script for a single admin user. This writes a single RSS feed which then people can subscribe to. The main problem with this is that the feed is not personalised – if I was in the networking team, I don’t want to know about the Oracle alerts.

Steve Shipway correctly identified this problem. His software, Nagios RSS, uses a slimmed down version of the Nagios status CGIs. Again, we weren’t keen on this – it only works with Nagios 1.x, the CGIs are going to disappear by 3.x and there’s a continuous poll on the Nagios server.

So we had to design our own. Our main requirement was to get personalised, authenticated feeds. As little performance overhead would be nice too!

We originally though about some central store of alerts and then a CGI to extract just the alerts required, based on the authenticated user. But it would have been a nightmare to work out what each contact was allowed to see. The key was that Nagios already knows this information, so just let Nagios do it!

Turns out, the trick is to use notifications per contact – each contact that wants RSS feeds has to specify it. This then becomes a direct replacement for email! Superb!

define contact {
contact_name	admin
...
service_notification_commands	service-notify-by-email,service-notify-by-rss
}

Even better – because this hooks into Nagios’ notifications, re-notifications will work, as will acknowledgements and escalations.

One possible problem is that notifications only happen with HARD state changes, so you may not see a problem as quickly as you would from a web browser. However, you wouldn’t get your email either.

We store each contact’s RSS feeds in a separate file, just like their mail server. When a user comes in to read their feed, they only get their data. Perfect!

But, how do we get a user’s authentication? Originally, we were thinking that a user goes to http://nagios.server.com/feeds/username to get their feed, but that wouldn’t provide the security. As CGIs already have security, why not have a single point, but then read their RSS feed independently?

So now the URL is fixed for everyone: http://nagios.server.com/cgi-bin/rss.cgi. When authenticated, the cgi will read that contact’s feed and return that information. There is a CGI invocation overhead, but I think it is necessary one. The CGI only reads a single file, and not try to work out status of all services.

Because it is a single feed, we can use nice things like have the web browser show RSS icons. We amend our HTML headers to be:

<link title="Opsview feed" type="application/rss+xml" rel="alternate" href="/nagios/cgi-bin/rss.cgi" />

On Safari, this displays an RSS box which you can click.

safari bar.png

Job done!

We call this software RSS4NAGIOS and you can get it here.

We’ve tested on Firefox 1.5+, but we recommend using NetNewsWire for MacOSX.

By the way, we wanted to use Atom instead of RSS2, but we just couldn’t get the XML::Atom::Syndication perl module to work nicely. This would be a good enhancement in future (we were thinking things like if a recovery happens, then the earlier failure should be marked as read – this would be impossible in email).

Let us know what you think!

[Update: RSS4NAGIOS 1.1 released to support host notifications]

Tagged with:
Mar 17

There’s a small bug in Nagios 2.0 where passive checks are marked as stale as soon as Nagios starts up. This will change it so that passive checks will be marked as stale when it exceeds the freshness_threshold after startup.

We’ve already sent this to the nagios-devel mailing list and Ethan has applied this to the Nagios 2.x branch in CVS, but we thought we’d add this here for completeness.

You can get the patch here.

Tagged with:
Mar 13

This is a potentially controversial patch, so may not get into the Nagios core code.

We’re working on integrating SNMPtraps into Opsview which are passive checks by nature. However, when a service is initially added into Nagios, the CGIs show them in a PENDING state, which looks like an error. We prefer to have a sea of green when things are OK.

A PENDING state is fine for a distributed monitoring setup, because the active check on the slaves will get through to the master soon, but not with other “irregular” passive checks. I couldn’t find a way to distinguish between a passive, “going to get a result from a distributed slave soon”, and a passive, “don’t know when the next result is going to be”.

So this patch will amend the CGIs so that passive checks (or more precisely, checks that are not scheduled to run) are displayed as OK, rather than PENDING.

Tagged with:
Feb 24

One of our customers wanted Nagios to allow an authenticated user to see a subset of the services available, but not allow the ability to run commands for that service (like reschedule the next check, add comments, disable active checks, etc). In Opsview, we call that a “view some, change none” role.

Seems like you can fake it with a combination of Apache’s access controls in .htpasswd by only allowing certain groups to access sbin/cmd.cgi. However, the interface is lousy – Apache just re-prompts you for a username and password, when you are already logged in!

Besides, the security model would be broken. It should be:

  • authentication – Webserver
  • access – Nagios

So we’ve made a change that puts the access control into Nagios: issue_commands.patch.

It implements a new attribute for contacts called issue_commands. This defaults to 1 (TRUE) for backwards compatibility, but if you specify 0 and this user then tries to submit a command, then you will get the usual Sorry, but you are not authorized to commit the specified command page, which is much friendlier.

The patch also updates the html documentation. It applies cleanly onto Nagios 2.0. We’ll let Ethan know to see if he wants to apply it to Nagios 2.1 or Nagios 3.0.

By the way, the new var/objects.cache is fantastic for debugging! I made a mistake in the patch when using a contact template, but I could tell just by restarting nagios and checking the objects.cache. Without it, it would have taken me ages to work out why the CGI wasn’t working as expected. Good job, Ethan!

Update: Ethan has applied this to the 3.0 branch and has renamed the attribute to can_submit_commands, which sounds better. The patch is updated to reflect. He also spotted a limitation of mine where the cgi could coredump if the contact authorised by the webserver was not recognised by Nagios. This is fixed in the patch.

Update: If you use Ndoutils with Nagios, make sure you update the included header files.

Tagged with:
Opsview © Opsera Limited 2010 All Rights Reserved
Nagios © 1999-2009 Ethan Galstad. Respective copyrights apply to third party source code
Opsview is a registered trademark of Opsera Limited. Nagios is a registered trademark of Nagios Enterprises. All Rights Reserved
preload preload preload