Jul 14

In a standard Nagios plus database implementation, you use NDOutils to store information in a database. While we think NDOutils is fantastic, there are some major limitations with it as you monitor more hosts. With Opsview, we want to scale. We’ve already done lots of work with NDOutils, including adding view-like helper tables, updating the database asynchronously, improved indices and speeding up the time to load the configuration at a Nagios reload. Now we want to share an amazing improvement we’ve discovered.

Continue reading »

Tagged with:
Mar 13

Michael Prochaska was having trouble with compiling NDOutils on Solaris 10. Since we have an interest in getting Opsview working on Solaris (the upcoming 2.12 release will add Solaris 10 as a supported platform), we offered to help. So this is the result of his company, Bacher Systems, sponsoring our work.

Continue reading »

Tagged with:
Apr 02

We’ve encountered some problems with mysql detection in NDOUtils – it doesn’t work on one of our redhat servers. The specific problem is that the ceil function is not found, which is because -lm is missing from the list of libraries to add at link time:


utils.o(.text+0x14e): In function `ndo_dbuf_strcat':
: undefined reference to `ceil'
collect2: ld returned 1 exit status

Rather than adding that library in manually (along with the -lz library that we found earlier for Mac OS X), we should use information from mysql_config to construct the compile flags. However, this is a bit tricky because of the various permutations.

Fortunately, the Nagios Plugins have a solution already. They have an m4 file, called np_mysqlclient.m4, that is used to detect mysql_config and this returns data from the msyql_config for configure to use.

So we’ve patched NDOUtils so that it uses this m4 file now. In order to use, you have to apply the patch to configure.in, add a new m4/ directory to the top level and copy np_mysqlclient.m4 into m4/. Then run:

aclocal -I m4
autoconf
./configure --with-mysql=DIR

The detection is the same as in the Nagios Plugins: ./configure will try to find mysql_config in DIR/bin/mysql_config, otherwise will look in the PATH.

The nice thing is that if the logic for detection needs to be enhanced, we can update the m4 file and propagate the changes back to the Nagios Plugins as well. So everyone wins!

There’s also a patch for CFLAGS in src/Makefile.in (which were getting overridden – presumably for testing), a small header change in config.h.in and some Makefile.in changes because make errors were getting lost by the cd .. command.

We’ve tested this on a Mac OS X server, a Debian Etch server, and 32bit and 64bit Redhat, and it is looking good.

Unfortunately, it means deprecating the –with-mysql-inc and –with-mysql-lib configure options. Hopefully, you’ll see why this way is so much nicer.

Here’s the patch against CVS HEAD.

Update: Here’s the patch, reworked for NDOutils 1.4b3

Update: You can get the tarball with just this patch here

Tagged with:
Jan 23

There has been a new update to NDOUtils to 1.4b2 recently and we thought we’d share our latest patches here so that they can be evaluated upstream.

We’ve always argued that it is best to be as close to the released code as possible – we don’t want the expense of maintaining a fork, so it’s in our interests to inform everyone about our changes. And since NDOUtils is gearing up for the 1.4 release, now is a good time to publicise them.

Our course, the link to our most stable code is updated daily, so the list below will not be accurate over time, but we’ve also uploaded the patches onto this blog server, so you can still reference them here. All patches will apply cleanly onto NDOUtils 1.4b2.

ndoutils_issue_commands.patch

This is the include header problem because we’ve changed the data structure for a contact. Long term, it is best if Nagios splits the include files out of NDOUtils and let it be installed by Nagios, but this is probably outside of Ethan’s radar right now.

ndoutils_daemonize.patch

We found that the ndo2db process wasn’t closing stdout, which meant the attaching terminal could not close. It looks like it should be set, but was commented out for debugging purposes. We uncomment those lines.

ndoutils_debug.patch

It looks like some memory debug is switched on by default. We switch them off here.

ndoutils_memory.patch

And the ifdef doesn’t actually switch them off – we correct that too. (We’re too lazy to combine these last two patches together!)


ndoutils_configure.patch

The configure script wasn’t respecting the –with-mysql-inc option correctly. We also test for the compress lib, which gave us problems on Mac OS X.

ndoutils_notification_level.patch

This is required to support our use of simple escalations. Again, a separate location for the include files for Nagios would remove the need for this.

ndoutils_clear_tables_on_reload.patch

This is the biggie. If the configuration for Nagios has changed and a reload requested, the ndo_object table do not reflect the new configuration. We found that the ndo_objects table is only updated on a restart, not a reload. This caused problems for status views that use the database because the new hosts and services weren’t there. This fixes that problem.

We also found that the active flag wasn’t correctly set to inactive when the configuration was dumped. Once we fixed that, we found that hosts and services in ndo_objects were marked inactive, when they should be active. This has also been fixed, along with a SQL typo.

Update: we’ve discovered that the configdumpstart routine gets called twice – once with the original data, and once with retained data. Looking at the ndomod data stream, it looks like the configdumpstart is sent with a huge set of data, then another configdumpstart with more data. The patch above has been re-worked so that the table clearing only occurs once, before the original data is sent through. This does beg the question of what is the difference between the original and the retained data – if there was a table clear happening between the original and retained data yet all data was there, why send all the original data?

Also, we found a bug where configdumpend was not being called. It turned out to be a missing break in the case statement. This is included in the patch above too.

DEFAULT CHARSET mismatches

We also run a perl script when the NDOUtils distribution is unpacked. We strip out all the DEFAULT CHARSET=ascii statements in mysql.src. This is because if the server has a different charset, you can get some collation errors in mysql. We think it is better to remove these altogether and leave the charset to be set by the mysql database. The script is:


perl -pi -e 's/DEFAULT CHARSET=ascii //' db/mysql.src

ndoutils_upgradedb.pl

Upgrading database schemas are a terrible pain. NDOUtils includes scripts to update the database, but there’s a manual step required to work out which scripts to apply. We’ve written a perl script (requires DBI.pm) to apply the upgrade scripts automatically, as long as the filename convention is adhered to. There’s also a new table created, called nagios_database_version, which holds a single row with the version of the database schema for subsequent updates.

The copyright for this script can be claimed by Ethan if he chooses to include it in the NDOUtils distribution. Otherwise, you are free to use it and distribute it yourselves under the GPL, but the copyright is retained by Altinity.

Hopefully, these patches will get included into the new NDOUtils soon, as we move forward to the next generation of Nagios status viewers.

Update 2: Ethan has applied these changes to CVS, except for the notification_level patch as that is a bit more involved.

Tagged with:
Sep 11

We are starting to use Ndoutils, which is the first event broker for Nagios. The idea with the event broker modules is that the functionality of Nagios can be extended without the core code being changed. Ethan has released ndoutils which writes Nagios data to a mysql database.

We managed to get Ndoutils 1.3.1 to compile, but whenever Nagios started up, we kept getting SIGSEGV and Nagios would crash. Nagios.log would say:

[1158012772] Nagios 2.5 starting... (PID=21793)
[1158012772] LOG VERSION: 2.0
[1158012773] ndomod: NDOMOD 1.3.1 Copyright (c) 2005-2006 Ethan Galstad (nagios@nagios.org)
[1158012773] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1158012773] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1158012773] Caught SIGSEGV, shutting down...

It took us a few hours to work out with lots of debugging lines in the ndoutils (printf statements + starting up Nagios manually without daemonizing), but we eventually found out that the Nagios 2 header files distributed with Ndoutils did not have our changes for can_submit_commands. The patch is here – obviously, only use this if you are using the can_submit_commands patch.

We’ve been speaking to Ethan because we think it is a good idea for Nagios to install the header files (maybe in /usr/local/nagios/include?), so then any local patches are done there, rather than trying to maintain multiple header files across different projects.

Hope this saves you a few hours!

Tagged with:
Feb 24

One of our customers wanted Nagios to allow an authenticated user to see a subset of the services available, but not allow the ability to run commands for that service (like reschedule the next check, add comments, disable active checks, etc). In Opsview, we call that a “view some, change none” role.

Seems like you can fake it with a combination of Apache’s access controls in .htpasswd by only allowing certain groups to access sbin/cmd.cgi. However, the interface is lousy – Apache just re-prompts you for a username and password, when you are already logged in!

Besides, the security model would be broken. It should be:

  • authentication – Webserver
  • access – Nagios

So we’ve made a change that puts the access control into Nagios: issue_commands.patch.

It implements a new attribute for contacts called issue_commands. This defaults to 1 (TRUE) for backwards compatibility, but if you specify 0 and this user then tries to submit a command, then you will get the usual Sorry, but you are not authorized to commit the specified command page, which is much friendlier.

The patch also updates the html documentation. It applies cleanly onto Nagios 2.0. We’ll let Ethan know to see if he wants to apply it to Nagios 2.1 or Nagios 3.0.

By the way, the new var/objects.cache is fantastic for debugging! I made a mistake in the patch when using a contact template, but I could tell just by restarting nagios and checking the objects.cache. Without it, it would have taken me ages to work out why the CGI wasn’t working as expected. Good job, Ethan!

Update: Ethan has applied this to the 3.0 branch and has renamed the attribute to can_submit_commands, which sounds better. The patch is updated to reflect. He also spotted a limitation of mine where the cgi could coredump if the contact authorised by the webserver was not recognised by Nagios. This is fixed in the patch.

Update: If you use Ndoutils with Nagios, make sure you update the included header files.

Tagged with:
Opsview © Opsera Limited 2010 All Rights Reserved
Nagios © 1999-2009 Ethan Galstad. Respective copyrights apply to third party source code
Opsview is a registered trademark of Opsera Limited. Nagios is a registered trademark of Nagios Enterprises. All Rights Reserved
preload preload preload