Jan 08

In our continual task to try and speed up Opsview, we found a bug in NSCA’s handling of aggregate writes when run in –single mode.

The specific failure scenario is this:

  1. NSCA and Nagios are told to start up
  2. A send_nsca request is received by NSCA before Nagios has created the nagios.cmd command pipe
  3. NSCA tries to write to open the command file, but sees it is not there
  4. NSCA opens the alternate dump file instead

Now when Nagios does create the nagios.cmd file, NSCA uses that … unless aggregate mode is on and daemon mode is –single. In this case, it continues to use the alternate dump file, thus Nagios doesn’t see the results from the slaves.

Here’s the patch, which we’ve also added into our source for Opsview.

As we are very keen on good testing, we’ve managed to recreate the failing behaviour in a test script. You also need a test configuration file and a patch to the test framework. If you run this test, it will show the error and then after the patch is applied, the test should pass.

Tagged with:
Jan 26

We ran across a problem with NSCA 2.6 yesterday day. It turned out that running the nsca daemon in single mode only works for the first packet of data from send_nsca and hung for subsequent calls.

This was actually first discovered by Rudolf van der Leeden and it looks like it has been with us since April 2006 when NSCA 2.6 was first released, through to the current NSCA 2.7. We never picked it up until running it on a customer site which was tuned to use –single.

The fix is as Rudolf suggests – uncommenting the if statement that was removed. Our patch is here.

How do we know it works? Well, we’ve written a series of test scripts for NSCA.

We’ve always been a big fan of testing. We love using the Test Anything Protocol (TAP) in Perl. CPAN encourages you to write good tests to make sure your Perl modules run, which is why we know that modules we’re uploaded to CPAN continue to work while we’ve been updating them. And we’ve provided quite a few fixes to CPAN modules where the tests fail (and some just suggest that we have a broken version of perl).

Here’s the test scripts for NSCA. They are more like functional testing – it tests that the daemon can start up and accept messages and compares the output in the dummy nagios.cmd file with the sent data. Unit testing is a bit more tricky to do for C code – though libtap is being used for the Nagios Plugins.

To use the test scripts, drop it down to the top level of the NSCA directory after you’ve compiled NSCA and cd into nsca_tests. Run prove *.t. You will require several CPAN modules: Test::More, Class::Struct, Clone and Parallel::Forker, though most will be with your perl distribution.

There are 3 tests at the moment:

  • basic – just sends a few passive checks and makes sure that the nagios.cmd file receives them
  • multiple – runs the same as basic, but several times to check the daemon can handle multiple requests
  • simultaneous – runs lots of send_nscas at the same time (well, nearly). Uses Parallel::Forker to setup all the sends then executes them all at once. Expect about 200 extra processes to hit your server!

You’ll find that multiple and simultaneous tests fails with the stock NSCA 2.6 and 2.7. But when the patch is applied, all the tests work.

The tests can obviously be extended, but this is a start and covers this basic functionality.

We hope Ethan will look into adding this to the NSCA distribution.

We’re upset that something like this got to one of our customers, but we’re more upset with ourselves for not catching this much earlier. This should be a good step towards better QA of future NSCA releases.

Update: Ethan has updated NSCA to 2.7.1 to fix this problem. And the tests are included as well!

Tagged with:
Nagios © 1999-2011 Nagios Enterprises LLC. Nagios, the Nagios logo, and Nagios graphics are the servicemarks,
trademarks, or registered trademarks owned by Nagios Enterprises, LLC. All Rights Reserved.
Opsview © 2008-2011 Opsera Ltd. Opsview, the Opsview Logo, and Opsview graphics are the
trademarks or registered trademarks owned by Opsera Limited. All Rights Reserved.
preload preload preload