<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Opsview Labs &#187; bug</title>
	<atom:link href="http://labs.opsview.com/tag/bug/feed/" rel="self" type="application/rss+xml" />
	<link>http://labs.opsview.com</link>
	<description>Opsview&#039;s Engineering Blog</description>
	<lastBuildDate>Fri, 20 Jan 2012 09:32:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Nagios bugs and how to fix them permanently</title>
		<link>http://labs.opsview.com/2011/01/nagios-bugs-and-how-to-fix-them-permanently/</link>
		<comments>http://labs.opsview.com/2011/01/nagios-bugs-and-how-to-fix-them-permanently/#comments</comments>
		<pubDate>Tue, 11 Jan 2011 16:53:25 +0000</pubDate>
		<dc:creator>tonvoon</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Nagios]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[continuous integration]]></category>
		<category><![CDATA[fix]]></category>
		<category><![CDATA[Hudson]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=707</guid>
		<description><![CDATA[
			
				
			
		We&#8217;ve just fixed a bug in Nagios® which an Opsview user had raised to us. A change made to Nagios in version 3.2.2 caused an issue where service alerts were being raised in the nagios.log file for every result that came back from a host that was down. This had the impact of adding lots [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2011%2F01%2Fnagios-bugs-and-how-to-fix-them-permanently%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2011%2F01%2Fnagios-bugs-and-how-to-fix-them-permanently%2F&amp;style=normal&amp;b=2" height="61" width="50" />
			</a>
		</div><p>We&#8217;ve just fixed a bug in Nagios® which an <a href="http://opsview.com">Opsview</a> user had raised to us. A <a href="http://tracker.nagios.org/view.php?id=128">change made to Nagios</a> in version 3.2.2 caused an issue where service alerts were being raised in the <em>nagios.log</em> file for every result that came back from a host that was down. This had the impact of adding lots of extra alerts that were overwhelming <a href="http://www.opsview.com/learn/screenshots">Opsview&#8217;s event views</a>.</p>
<p><span id="more-707"></span>To reproduce the problem in Nagios 3.2.3:</p>
<ol>
<li>Create a host with 2 service checks</li>
<li>Let this run normally</li>
<li>Shutdown the host</li>
<li>The first service check will notice the state change and set the host to be checked. It will go into a SOFT state and the service will go into a check attempt of 2 and continue into a hard state correctly</li>
<li>The 2nd service check will see that the host is DOWN and force a hard state failure with check attempt 1 of a maximum 4. However, this hard state change did not set the last_hard_state flag correctly, which meant every subsequent check was considered to be a new hard state failure and hence a SERVICE ALERT was raised every time in <em>nagios.log</em></li>
</ol>
<p>This took a long time to track down, but we&#8217;ve found the problem and fixed it. Our fix is pushed to Nagios <a href="http://article.gmane.org/gmane.network.nagios.cvs/3045">already</a>.</p>
<p>While this bug is annoying, we&#8217;re upset that this had an impact on a customer system. We make it our principle to keep as up to date with Nagios as possible because Opsview is a <em>shallow fork</em> of Nagios &#8211; we make only the changes that are necessary to support our customers and we push our changes back upstream where we can.</p>
<p>We&#8217;ve developed a lot of trust with our users &#8211; we make the upgrade process for Opsview as easy as possible because we want all our users to get to the latest version (in fact, we&#8217;ve just had one user update their Opsview from 4 years ago, right up to the latest version, going through over a hundred database changes!).</p>
<p>One thing we do to make sure our systems work as expected, is to continuously test our latest versions of Opsview. We use <a href="http://hudson-ci.org/">Hudson</a> to test Opsview on every change &#8211; currently this runs 5269 individual tests, taking 1 hour 46 minutes.</p>
<p>We want to bring this level of quality assurance to Nagios &#8211; included in our fix is a <a href="http://article.gmane.org/gmane.network.nagios.cvs/3046">test case</a> that checks exactly this issue. Running tests on Nagios will now show that this problem is fixed forever and our nightly builds of Opsview includes these too.</p>
<p>So now everyone can sleep easier knowing that this problem is never going to happen again.</p>
<p><small>Please note: Nagios, the Nagios logo, and Nagios graphics are the servicemarks,  trademarks, or registered trademarks owned by Nagios Enterprises</small></p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2011/01/nagios-bugs-and-how-to-fix-them-permanently/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

