<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Opsview Labs</title>
	<atom:link href="http://labs.opsview.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://labs.opsview.com</link>
	<description>Opsview&#039;s Engineering Blog</description>
	<lastBuildDate>Thu, 26 Aug 2010 13:31:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Grails &amp; Hudson part 1: CodeNarc</title>
		<link>http://labs.opsview.com/2010/08/grails-hudson-part-1-codenarc/</link>
		<comments>http://labs.opsview.com/2010/08/grails-hudson-part-1-codenarc/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 13:25:43 +0000</pubDate>
		<dc:creator>rbramley</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Frameworks]]></category>
		<category><![CDATA[Grails]]></category>
		<category><![CDATA[Hudson]]></category>
		<category><![CDATA[Opsview]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[CodeNarc]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=483</guid>
		<description><![CDATA[
			
				
			
		The first part of a series of posts on Grails and Hudson leading up to a presentation at the London Groovy &#38; Grails User Group. Subsequent instalments will include testing (unit, integration, functional), test coverage, automatic war deployment and monitoring Hudson with Opsview Enterprise.

Grails has a rich plugin eco-system with over 400 hundred plugins – so it’s [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F08%2Fgrails-hudson-part-1-codenarc%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F08%2Fgrails-hudson-part-1-codenarc%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p><a class="lightbox" title="grails" href="http://labs.opsview.com/wp-content/uploads/2010/08/grails.png"><img class="alignleft size-full wp-image-501" style="margin-bottom: 8px; margin-right: 10px;" title="grails" src="http://labs.opsview.com/wp-content/uploads/2010/08/grails.png" alt="" width="176" height="53" /></a>The first part of a series of posts on <a href="http://en.wikipedia.org/wiki/Grails_(framework)">Grails</a> and <a href="http://en.wikipedia.org/wiki/Hudson_(software)">Hudson</a> leading up to a presentation at the London Groovy &amp; Grails User Group. Subsequent instalments will include testing (unit, integration, functional), test coverage, automatic war deployment and monitoring Hudson with <a href="https://www.opsview.com/products/opsview-enterprise">Opsview Enterprise</a>.</p>
<div><span id="more-483"></span></div>
<p>Grails has a rich plugin eco-system with over 400 hundred plugins – so it’s easy to miss something useful. If you’re serious about software craftsmanship, then using static code analysis tools should be part of your quality regime as it gives further insight into the code base (and if you insist, yes it’ll help with your Technical Debt management).</p>
<p><a title="CodeNarc" href="http://codenarc.sourceforge.net/">CodeNarc</a> provides static code analysis for Groovy and the <a title="CodeNarc plugin" href="http://www.grails.org/plugin/codenarc/">CodeNarc plugin for Grails</a> allows you to perform this analysis with the “grails codenarc” script. Behind the scenes this uses the CodeNarc ant task and settings from grails-app/conf/Config.groovy and produces an HTML report by default.</p>
<p>Until recently, if you used the codenarc target within a continuous integration server such as <a title="Hudson CI" href="http://hudson-ci.org/">Hudson</a> – then the HTML report would be generated and sit in the workspace waiting for a diligent developer to check it. You can imagine how often that happens in practice with all the other demands of a project!</p>
<p>However, I’ve now integrated the CodeNarc XML output with the <a title="Violations plugin" href="http://wiki.hudson-ci.org/display/HUDSON/Violations">Hudson Violations plugin </a>so that an overview trend line is shown against the Hudson job. Then the team quickly fixed the violations…</p>
<p><a class="lightbox" title="1" href="http://labs.opsview.com/wp-content/uploads/2010/08/1.png"><img class="aligncenter size-full wp-image-484" title="1" src="http://labs.opsview.com/wp-content/uploads/2010/08/1.png" alt="" width="300" height="169" /></a>You can also get a breakdown by priority:</p>
<p><a class="lightbox" title="2" href="http://labs.opsview.com/wp-content/uploads/2010/08/2.png"><img class="aligncenter size-full wp-image-485" title="2" src="http://labs.opsview.com/wp-content/uploads/2010/08/2.png" alt="" width="300" height="259" /></a>And in-context views of the violations so you know what to fix:</p>
<p><a class="lightbox" title="3" href="http://labs.opsview.com/wp-content/uploads/2010/08/3.png"><img class="aligncenter size-full wp-image-486" title="3" src="http://labs.opsview.com/wp-content/uploads/2010/08/3.png" alt="" width="300" height="143" /></a></p>
<p>This is how you do it…</p>
<p>Grails <strong>config.groovy</strong><br />
<code>codenarc {<br />
reportName = 'target/test-reports/CodeNarcReport.xml'<br />
reportType = 'xml'<br />
// any further settings like maxPriority1Violations=0<br />
}</code></p>
<h3>Hudson</h3>
<p>Set up your Grails build step – normally you’d add the ‘codenarc’ target:</p>
<p><a class="lightbox" title="4" href="http://labs.opsview.com/wp-content/uploads/2010/08/4.png"><img class="aligncenter size-full wp-image-487" title="4" src="http://labs.opsview.com/wp-content/uploads/2010/08/4.png" alt="" width="519" height="459" /></a></p>
<p><em>The <a title="CodeNarc rule configuration" href="http://codenarc.sourceforge.net/codenarc-configuring-rules.html#Configuring_Rules_Using_a_Properties_File">codenarc.properties</a> file can be used to configure specific exclusions, the location of this file can be passed in as a system property (shown above).</em></p>
<p>You need to have installed the violations plugin (Manage Hudson &gt; Manage Plugins &gt; Available and search for Violations). As this is a recent addition, I’m using a patched version of the Violations plugin, though the patch has been integrated into trunk (I’ll update when it is released). Configure violations:</p>
<p><a class="lightbox" title="5" href="http://labs.opsview.com/wp-content/uploads/2010/08/5.png"><img class="aligncenter size-full wp-image-488" title="5" src="http://labs.opsview.com/wp-content/uploads/2010/08/5.png" alt="" width="526" height="515" /></a></p>
<p><em>Note the ‘Faux project path’ – you may need to set this to get an in-context view working properly due to path (e.g. if your code is checked out to workspace/trunk)</em></p>
<p>I also had a contribution to CodeNarc accepted at the weekend to add an<em>inlineXml</em> report type – this will, with a minor tweak to the CodeNarc parser, allow the Hudson Violations plugin to give the rule description on the pop-up message.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/08/grails-hudson-part-1-codenarc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Measuring the rate of change for SNMP values using Nagios Plugins</title>
		<link>http://labs.opsview.com/2010/08/measuring-the-rate-of-change-for-snmp-values-using-nagios-plugins/</link>
		<comments>http://labs.opsview.com/2010/08/measuring-the-rate-of-change-for-snmp-values-using-nagios-plugins/#comments</comments>
		<pubDate>Tue, 24 Aug 2010 16:59:11 +0000</pubDate>
		<dc:creator>tcallway</dc:creator>
				<category><![CDATA[Nagios]]></category>
		<category><![CDATA[Opsview]]></category>
		<category><![CDATA[SNMP]]></category>
		<category><![CDATA[plugins]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=471</guid>
		<description><![CDATA[
			
				
			
		The Nagios Plugins project recently released a new version. Amongst the changes is a new feature which we added for a customer. The requirement was to measure the rate of change for SNMP counters. The standard check_snmp plugin is great at getting information, but only at a specific moment in time. For some things, you [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F08%2Fmeasuring-the-rate-of-change-for-snmp-values-using-nagios-plugins%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F08%2Fmeasuring-the-rate-of-change-for-snmp-values-using-nagios-plugins%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p><a class="lightbox" title="Jigsaw-piece_full" href="http://labs.opsview.com/wp-content/uploads/2010/08/Jigsaw-piece_full.png"><img class="alignleft size-full wp-image-477" style="margin-bottom: 8px; margin-right: 10px;" title="Jigsaw-piece_full" src="http://labs.opsview.com/wp-content/uploads/2010/08/Jigsaw-piece_full.png" alt="Plugins" width="180" height="182" /></a>The Nagios Plugins project recently released a <a href="http://nagiosplugins.org/nagiosplugins-1.4.15">new version</a>. Amongst the changes is a new feature which we added for a customer. The requirement was to measure the rate of change for SNMP counters. The standard check_snmp plugin is great at getting information, but only at a specific moment in time. For some things, you want to check and alert on the <em>rate of change</em>. There&#8217;s a lot of interesting metrics that you can get from SNMP which are Counter32 or Counter64 values. An example is IP-MIB::ipInAddrErrors.0. This counts the number of packets that are discarded due to invalid IP addresses &#8211; probably due to network errors or an attempt at infiltrating your network.</p>
<p><span id="more-471"></span></p>
<p>We could have met the requirements by writing a <a href="http://nagiosplug.sourceforge.net/developer-guidelines.html">new plugin</a>. But we wanted to this in the core plugins because it is useful functionality for everyone. To enable this feature, we had to introduce some library functions so that plugins could save state information. After some discussions with the Nagios Plugins core team, we designed the <a href="http://nagiosplugins.org/c-apis-private">library functions</a> for inclusion in the Nagios Plugins C library. There have been <a href="http://sourceforge.net/mailarchive/message.php?msg_name=20090909204742.GI29402%40phcomp.co.uk">attempts before</a>, but the interface was unnecessarily complex. We believe in designing things to work yet keeping it simple &#8211; we make these design decisions every day!</p>
<p>But there&#8217;s more! Having the plugin is only half the story because we wanted people to easily configure it using Opsview&#8217;s web interface. Opsview has the concept of a &#8220;<a href="http://docs.opsview.org/doku.php?id=opsview-community:servicecheck">service check</a>&#8221; which defines the things you care about and how to check them. On this page, you can select SNMP polling, do an SNMP walk to gather all the available SNMP metrics and then you just click to select it. Nice!</p>
<p>You can see the configuration in our <a href="http://www.opsview.com/learn/demos-tutorials/whats-new-opsview-enterprise-380">screencast</a>.</p>
<p>Even if you don&#8217;t use Opsview, you can take advantage of this new feature by downloading the latest Nagios Plugins. We continue to work to improve our open source base. Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/08/measuring-the-rate-of-change-for-snmp-values-using-nagios-plugins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>5 reasons why you need IT monitoring</title>
		<link>http://labs.opsview.com/2010/07/5-reasons-why-you-need-it-monitoring/</link>
		<comments>http://labs.opsview.com/2010/07/5-reasons-why-you-need-it-monitoring/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 07:17:29 +0000</pubDate>
		<dc:creator>James Peel</dc:creator>
				<category><![CDATA[Opsview]]></category>
		<category><![CDATA[System Management]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[5 reasons]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[why]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=449</guid>
		<description><![CDATA[
			
				
			
		All enterprises depend on reliable servers, network devices and business applications. Any downtime hits your bottom line. To ensure maximum IT performance across your business, you have to identify and resolve problems before they impact the user experience or security of your data.

To diagnose reliability and performance problems on complex networks you need visibility of [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F07%2F5-reasons-why-you-need-it-monitoring%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F07%2F5-reasons-why-you-need-it-monitoring%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>All enterprises depend on reliable servers, network devices and business applications. Any downtime hits your bottom line. To ensure maximum IT performance across your business, you have to identify and resolve problems before they impact the user experience or security of your data.</p>
<p><span id="more-449"></span></p>
<p>To diagnose reliability and performance problems on complex networks you need visibility of system events combined with the ability to spot trends and exceptions. For example, without proactive IT monitoring how can you tell whether poor application performance is related to network congestion, database performance or a high number of users?</p>
<table width="100%">
<tbody>
<tr>
<td style="text-align: center;">
<p><div id="attachment_458" class="wp-caption aligncenter" style="width: 280px"><a class="lightbox" title="derwentEvents500px" href="http://labs.opsview.com/wp-content/uploads/2010/07/derwentEvents500px.png"><img class="size-full wp-image-458 " style="border: 1px solid #ccc;" title="derwentEvents500px" src="http://labs.opsview.com/wp-content/uploads/2010/07/derwentEvents500px.png" alt="" width="270" height="141" /></a><p class="wp-caption-text">Monitoring network events as they happen!</p></div></td>
<td style="text-align: center;">
<p><div id="attachment_457" class="wp-caption aligncenter" style="width: 280px"><a class="lightbox" title="drupalCheck500px" href="http://labs.opsview.com/wp-content/uploads/2010/07/drupalCheck500px.png"><img class="size-full wp-image-457 " style="border: 1px solid #ccc;" title="drupalCheck500px" src="http://labs.opsview.com/wp-content/uploads/2010/07/drupalCheck500px.png" alt="" width="270" height="141" /></a><p class="wp-caption-text">Powerful graphing tools</p></div></td>
</tr>
</tbody>
</table>
<p>Here are five reasons why you need better network and application visibility:</p>
<p><strong>1.    To maximise your return on network investments for business application delivery</strong><br />
All businesses invest heavily in setting up and maintaining their IT infrastructure whether it be physical, virtualised or in the Cloud. Effective capacity planning allows you to target your spending wisely and track your return on investment. IT monitoring gives you instant access to the data you need for trend analysis and provides data warehouse capability suitable for enterprise reporting and capacity planning.</p>
<p style="text-align: center;">
<p><strong>2.    To anticipate and resolve issues before they’re problems</strong><br />
Knowing when there is a problem with your IT systems can be useful but anticipating problems before they occur is the real value of system monitoring. IT monitoring gives you the visibility needed to spot performance and capacity issues before they have an impact on your systems. It can also process alarms from network, storage, power and cooling infrastructure ensuring you can mitigate the effects of failing hardware.</p>
<p><strong>3.    Make your staff more effective</strong><br />
The biggest asset for any business is its staff, also usually this is the biggest cost. Adoption of effective IT monitoring tools shifts the focus of your staff from fire-fighting to forward-planning. A project-based approach is more time efficient and allows you to focus the talents of your staff more effectively.</p>
<p><strong>4.    To ensure quality and service levels</strong><br />
Maintaining quality of service requires visibility. Performance issues can remain hidden to the business while quietly eroding the confidence of customers and partners. IT monitoring exposes current problems and gives you the tools for long terms analysis and reporting. It provides service level reporting, providing clear evidence that you’re meeting the required service levels.</p>
<p><strong>5.    Move things forward!</strong><br />
System monitoring is the cornerstone of IT management best practice. By adopting IT monitoring you’re making immediate efficiency gains and you’re laying the foundation for even greater levels of automation, scalability and resiliency.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/07/5-reasons-why-you-need-it-monitoring/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Opsview is 75% faster than a standard Nagios database implementation</title>
		<link>http://labs.opsview.com/2010/07/opsview-is-75-faster-than-a-standard-nagios-database-implementation/</link>
		<comments>http://labs.opsview.com/2010/07/opsview-is-75-faster-than-a-standard-nagios-database-implementation/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 10:00:05 +0000</pubDate>
		<dc:creator>tonvoon</dc:creator>
				<category><![CDATA[Nagios]]></category>
		<category><![CDATA[Opsview]]></category>
		<category><![CDATA[Ndoutils]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=435</guid>
		<description><![CDATA[
			
				
			
		
In a standard Nagios plus database implementation, you use NDOutils to store information in a database. While we think NDOutils is fantastic, there are some major limitations with it as you monitor more hosts. With Opsview, we want to scale. We&#8217;ve already done lots of work with NDOutils, including adding view-like helper tables, updating the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F07%2Fopsview-is-75-faster-than-a-standard-nagios-database-implementation%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F07%2Fopsview-is-75-faster-than-a-standard-nagios-database-implementation%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><div>
<p><a class="lightbox" title="stopwatch" href="http://labs.opsview.com/wp-content/uploads/2010/07/stopwatch.jpeg"><img class="alignleft size-full wp-image-428" title="stopwatch" src="http://labs.opsview.com/wp-content/uploads/2010/07/stopwatch.jpeg" alt="" width="168" height="226" /></a>In a standard Nagios plus database implementation, you use NDOutils to store information in a database. While we think NDOutils is fantastic, there are some major limitations with it as you monitor more hosts. With Opsview, we want to scale. We&#8217;ve already done lots of work with NDOutils, including adding view-like helper tables, updating the database asynchronously, improved indices and speeding up the time to load the configuration at a Nagios reload. Now we want to share an amazing improvement we&#8217;ve discovered.</p>
<p><span id="more-435"></span></p>
<p>We know that the nagios_servicechecks table is the most heavily used table. This records every result that flows into Nagios, whether it is actively or passively checked. The statement to add a row in that table is an INSERT &#8230; ON DUPLICATE KEY UPDATE &#8230;.</p>
<p>However, this has problems. In our experience with the <a href="http://docs.opsview.com/doku.php?id=opsview-community:odw">Opsview Data Warehouse</a> &#8211; where we took best practise information from datawarehouse experts – <a href="http://en.wikipedia.org/wiki/Fact_table">fact tables</a> should not have unique keys unless they really are unique. There needs to be suitable indices to help the queries, but uniqueness means that some records may be updated when you expect to have a new record instead.</p>
<p>This gave us pause to wonder why the statement was an UPDATE. Further investigation showed that Nagios was sending extra messages to the database for processing.</p>
<p>The flow was:</p>
<ol>
<li>a service check is initiated with an NEBTYPE_SERVICECHECK_INITIATE event being fired. NDOutils adds a new row into the table with start times but no result</li>
<li>a NEBTYPE_SERVICECHECK_ASYNC_PRECHECK was being fired &#8211; this is to allow other broker modules to intercept a service check execution. This was being sent to NDOutils, but not processed</li>
<li>finally, a NEBTYPE_SERVICECHECK_PROCESSED event was fired &#8211; this updates the earlier row with the results of the check</li>
</ol>
<p>In order to work out the &#8220;earlier row&#8221;, NDOutils used the unique index which consists of the instance_id, object id, start time and start time usec (micro seconds). However, with passive check results, the start time usec is always set to 0. This means it is possible to lose results if you have checks which have the same start time for the same object.</p>
<p>We took the view that (1) and (2) were not necessary. That meant (3) was the only event that needed to be processed by NDOutils. So our change was to tell (1) and (2) not to send information to NDOutils, and to update the command for (3) to do a straight INSERT, rather than an INSERT &#8230; ON DUPLICATE KEY UPDATE &#8230;.. This saved an index lookup.</p>
<p>We also changed the database index to reflect this whilst making it much smaller. The index used to consist of (start_time, instance_id, service_object_id, start_time_usec) &#8211; this meant for each row, the index was adding another 36 bytes. However, we changed it to (start_time) &#8211; only 8 bytes. Opsview only has 1 instance_id, so it is not necessary to include it in the index.</p>
<p>If you are keeping score, here are the improvements:</p>
<ul>
<li>Reduced number of events sent to NDOutils by 66%</li>
<li>Reduced number of SQL statements by 50%</li>
<li>Changed 1 SQL statement, making it a smaller statement and saving an index lookup</li>
<li>Reduced the size of one index by 77%</li>
</ul>
<p>To test this was easy. As Opsview uses an asynchronous method of updating the database, you can change a debug file and Opsview will automatically start copying the data that would be pushed to the database. This gave us an NDO data packet. We then updated this data packet to have 10000 events of the same object. And then we pushed this to our database instance.</p>
<p>Results? 10000 records was taking 23 seconds to update the database. With our changes, this reduced down to 6 seconds! We&#8217;re thrilled that this has speeded up one of the most common database operations.</p>
<p>NDOutils is distributed under the GPL, which stipulates that all changes have to be available to our users. We go one better because Opsview is <a href="http://opsview.com/community/developer-zone">open source</a> and we <a href="https://secure.opsera.com/wsvn/wsvn/opsview/trunk/?#path_trunk_">publish our source code</a>, so everyone can benefit from our findings. Our complete patch list (for our 3rd party software) is <a href="https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-base/patches/?#path_trunk_opsview-base_patches_">here</a>.</p>
<p>The specific patch for this change is <a href="https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-base/patches/ndoutils_no_unique_key_on_servicechecks.patch">here</a>.</p>
<p>This improvement is shipped with Opsview Enterprise 3.8.0. Keep your eyes out for more performance tuning enhancements and new features that we will be adding to Opsview in the next few months!</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/07/opsview-is-75-faster-than-a-standard-nagios-database-implementation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GWT/GXT dashboard primer</title>
		<link>http://labs.opsview.com/2010/05/gwtgxt-dashboard-primer/</link>
		<comments>http://labs.opsview.com/2010/05/gwtgxt-dashboard-primer/#comments</comments>
		<pubDate>Wed, 19 May 2010 13:03:20 +0000</pubDate>
		<dc:creator>tcallway</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Frameworks]]></category>
		<category><![CDATA[GXT]]></category>
		<category><![CDATA[Google Web Toolkit]]></category>
		<category><![CDATA[Opsview]]></category>
		<category><![CDATA[System Management]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[gwt]]></category>
		<category><![CDATA[next-gen]]></category>
		<category><![CDATA[opsview enterprise]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=369</guid>
		<description><![CDATA[
			
				
			
		This post is based on research we’ve undertaken to develop a pilot mash-up style, charting dashboard for our monitoring solution, Opsview Enterprise. However the concepts we discuss could be used when building a dashboard that displays information from many other enterprise solutions. The assumption we make is that the reader is familiar with Java and [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F05%2Fgwtgxt-dashboard-primer%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F05%2Fgwtgxt-dashboard-primer%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>This post is based on research we’ve undertaken to develop a pilot mash-up style, charting dashboard for our monitoring solution, Opsview Enterprise. However the concepts we discuss could be used when building a dashboard that displays information from many other enterprise solutions. The assumption we make is that the reader is familiar with Java and the Google Web Toolkit (GWT). For more information about GWT and the other libraries used in this blog please see the Resources section at the end.<br />
<span id="more-369"></span></p>
<h4>Introduction</h4>
<p>The Opsview Enterprise platform is very flexible and allows us to extract monitoring information via a REST based API. This API returns monitoring data as JSON objects or XML so it is easy to integrate with many UI frameworks. In this example we will develop a mini portal-like application that displays status for different host groups in one of its portlets.</p>
<p>The client will be using the <a href="http://www.extjs.com/products/gwt/">‘GXT’ framework</a> that is based on GWT for the user interface and some Java libraries for HTTP communication and JSON parsing.</p>
<h4>Using the REST based API to talk to Opsview Enterprise</h4>
<p>Let’s first have a look at how you can call Opsview Enterprise from a browser to collect monitoring data. In this case we can use the following URL to fetch host groups:</p>
<p><code>http://&lt;hostname&gt;/opsview/api/status/hostgroup</code></p>
<p>If we try using this URL directly from a browser it will not work, a 403 access denied will be returned even if we login with correct credentials. A couple of extra headers are needed for the authentication to work; we will look at them later on.</p>
<p>To get around this problem we must first login to Opsview Enterprise the normal way via the Opsview Enterprise UI and then hit the URL:</p>
<p><code>https://&lt;hostname&gt;/opsview/status/hostgroup</code> </p>
<p>This is basically the first URL you get to. Now you will have a session in the browser so you can test the API URL again.</p>
<p>It will return JSON looking something like this:</p>
<pre><code>{"hostgroup":{
"summary":{"handled":782,"unhandled":60,"total":842,
   "service":{"ok":558,"critical":206,"handled":747,"unknown":29,
"unhandled":58,"warning":12,"total":805},
  		   "host":{"handled":35,"unhandled":2,"down":7,"up":30,"total":37}},
"list":[
        {
            "hostgroup_id":"7",
            "hosts":{ "handled":2, "unhandled":0, "down":{"handled":1},
"up":{"handled":1}, "total":2},
"services":{"ok":{"handled":24},"critical":{"handled":21,"unhandled":3},
"handled":50,"highest":"critical","unknown":{"handled":5}, "unhandled":3,"total":53},
            "name":"ProjectX",
            "downtime":null
        },
        {
            "hostgroup_id":"13",
            "hosts":{"handled":2,"unhandled":1,"down":{"handled":2,"unhandled":1},"total":3},
            "services":{"critical":{"handled":73},"handled":86,"highest":"critical",
			"unknown":{"handled":13},"unhandled":0,"total":86},
            "name":"Internal",
            "downtime":"2"
        },
        {
            "hostgroup_id":"14",
            "hosts":{"handled":31,"unhandled":1,"down":{"handled":2,"unhandled":1},
"up":{"handled":29},"total":32},
            "services":{"ok":{"handled":534},"critical":{"handled":66,"unhandled":43},
"handled":611,"highest":"critical","unknown":{"handled":9,
"unhandled":2},"unhandled":55,"warning":{"handled":2,"unhandled":10},
"total":666},
            "name":"Hosting",
            "downtime":"2"
        }
    ]}}</code></pre>
<p>So we can see by looking at it that it is nested JSON and in this case it returned the current status for 3 host groups: ProjectX, Internal, and Hosting.</p>
<p>Because the JSON is nested we will benefit from doing some processing on it before giving it to the UI routines that in GXT which only handles flat JSON.</p>
<h4>The Architecture for the Mini-Dashboard Application</h4>
<p>The following picture shows an overview of the architecture and the different components that will be used in this solution:</p>
<p><a class="lightbox" title="DashboardAppArchitecture" href="http://labs.opsview.com/wp-content/uploads/2010/05/DashboardAppArchitecture.png"><img class="aligncenter size-full wp-image-371" title="DashboardAppArchitecture" src="http://labs.opsview.com/wp-content/uploads/2010/05/DashboardAppArchitecture.png" alt="" width="400" /></a></p>
<p>The GXT client will not call the Opsview Enterprise server directly but instead go via a proxy server. This has the benefit that the JSON response from the Opsview Enterprise server can be processed and converted into <code>Serializable</code> Java objects. In this case the JSON is turned into a Java object called <code>HostGroup</code>, which in turn contains a <code>HostStatus</code> object. The <code>HostGroup</code> object will contain enough information to be able to draw a pie chart showing the status for the different host groups.</p>
<p>The <code>HostStatus</code> object will be used when the user clicks on one of the host group pie slices and wants to drill down to see more detailed status for a particular host group. Because we send the <code>HostStatus</code> object at the same time as the <code>HostGroup</code> object we do not have to make another remote call when the user drills down to see individual host group status.</p>
<p>Another benefit of having the proxy server is that it enables us to have the Mini Dashboard client web app and the Opsview Enterprise server web app on different hosts in different domains. If we did not have the proxy server it would not be possible to access content in Opsview Enterprise as for security reasons you cannot open URL connections to a different  host to the host from where the client is downloaded. Instead, you would have to install the Dashboard web app on the same host as Opsview Enterprise is running on.</p>
<h4>The Mini-Dashboard Proxy Server</h4>
<p>Let’s start implementing the proxy server using the Apache Commons HTTP Client and the Jackson JSON parser. For the remote calls between the Mini-Dashboard client and the proxy server (i.e. AJAX based client calls) we will implement a <code>com.google.gwt.user.client.rpc. RemoteService</code> as provided by Google Web Toolkit. It will handle all marshalling of calls for us.</p>
<p>Here is a UML Diagram giving you an overview of the involved proxy server classes:<br />
<a class="lightbox" title="ServerImplUMLDiagram" href="http://labs.opsview.com/wp-content/uploads/2010/05/ServerImplUMLDiagram.png"><img class="aligncenter size-full wp-image-384" title="ServerImplUMLDiagram" src="http://labs.opsview.com/wp-content/uploads/2010/05/ServerImplUMLDiagram.png" alt="" width="400" /></a></p>
<p>First setup the OpsviewServerProxyService interface as follows:</p>
<pre><code>/**
  *
  * The client side stub for the Opsview Proxy Service RPC service.
  * @author Martin Bergljung, Opsera Ltd.
  */
@RemoteServiceRelativePath("opsviewproxy")
public interface OpsviewServerProxyService extends RemoteService {
    public static final String SERVICE_NAME = "OpsviewServerProxyService";

    public List getHostGroups();
}</code></pre>
<p>The <code>RemoteServiceRelativePath</code> annotation associates a <code>RemoteService</code> with a relative path. This annotation will cause the client-side proxy to automatically invoke the <code>ServiceDefTarget.setServiceEntryPoint</code> method with<br />
<code>GWT.getModuleBaseURL() + value()</code> as its argument. Subsequent calls to <code>ServiceDefTarget.setServiceEntryPoint</code> will override this default path.</p>
<p>We define one method called <code>getHostGroups</code> that will return a list of <code>HostGroup</code> objects. Create a <code>HostGroup</code> object as follows:</p>
<pre><code>/**
 * Host Group (method level javadoc omitted for simplicity).
 *
 *  {
 *   "hostgroup_id":"14",
 *   "hosts":{"handled":31,"unhandled":1,"down":{"handled":2,"unhandled":1},
 *            "up":{"handled":29},"total":32},
 *   "services":{"ok":{"handled":534},"critical":{"handled":66,"unhandled":43},
 *              "handled":611,"highest":"critical","unknown":{"handled":9,"unhandled":2},
 *              "unhandled":55,"warning":{"handled":2,"unhandled":10},"total":666},
 *   "name":"Internal",
 *   "downtime":"2"
 *   }
 *
 * @author Martin Bergljung, Opsera Ltd.
 */
public class HostGroup implements Serializable {
    private int m_downtime;
    private String m_hostgroupId;
    private HostStatus m_hostStatus;
    private String m_name;
    private String m_services; // not used

    public HostGroup() {}

    //Getters, Setters, equals, hashCode, and toString have intentionally been left out to save space 

}
</code></pre>
<p>Next step is to create an asynchronous version of the interface as AJAX calls are asynchronous by nature.  This is the actual interface that the client will come in contact with:</p>
<pre><code>/**
 * The async counterpart of <code>OpsviewServerProxyService</code>.
 */
public interface OpsviewServerProxyServiceAsync {
    public void getHostGroups(AsyncCallback
&gt; callback);
}       </code></pre>
<p>The extra element here is the callback parameter that the client will have to implement. Now let’s move on to the actual implementation of this interface. Create the following class that implements the <code>getHostGroups</code> method:</p>
<pre><code>/**
 * The server side implementation of the Opsview Proxy Server RPC service.
*/
public class OpsviewServerProxyServiceImpl extends RemoteServiceServlet
implements OpsviewServerProxyService {
   public static final String OPSVIEW_API_BASE_URI =
            "https:///opsview/api";
   public static final String OPSVIEW_API_HOSTGROUP_STATUS_URI =
            OPSVIEW_API_BASE_URI + "/status/hostgroup";

   public List getHostGroups() {
        String json = callOpsview(OPSVIEW_API_HOSTGROUP_STATUS_URI);

        if (json != null &amp;&amp; json.length() &gt; 0) {
            return readHostGroupsJSON(json);
        }

        return new ArrayList();
   }
</code></pre>
<p>Here you need to update the hostname in the OPSVIEW_API_BASE_URI constant to reflect your installation. Also, if you are not using SSL then change to http.</p>
<p>Then add the <code>readHostGroupsJSON</code> method that takes JSON returned from Opsview Enterprise and turns it into a list of <code>HostGroup</code> objects:</p>
<pre><code>    private List readHostGroupsJSON(String json) {
        List hostGroups = new ArrayList();

        try {
            ObjectMapper mapper = new ObjectMapper(); // can reuse, share globally
            JsonNode rootNode = mapper.readValue(json, JsonNode.class);
            JsonNode hostGroupNode = rootNode.path("hostgroup");
            JsonNode listNode = hostGroupNode.path("list");

            for (JsonNode hostGroup : listNode) {
                HostGroup hg = new HostGroup();
                hg.setName(hostGroup.path("name").getTextValue());
                hg.setDowntime(hostGroup.path("downtime").getIntValue());
                String hostGroupId = hostGroup.path("hostgroup_id").getTextValue();
                hg.setHostGroupId(hostGroupId);
                hg.setServices("0"); // not used at the moment

                JsonNode hostStatus = hostGroup.path("hosts");
                HostStatus hs = new HostStatus();
                hs.setHostGroupId(hostGroupId);
                hs.setHandled(hostStatus.path("handled").getIntValue());
                hs.setUnhandled(hostStatus.path("unhandled").getIntValue());
                hs.setTotal(hostStatus.path("total").getIntValue()); // only one needed now

                JsonNode down = hostStatus.path("down");
                JsonNode up = hostStatus.path("up");
                hs.setDownHandled(down.path("handled").getIntValue());
                hs.setDownUnhandled(down.path("unhandled").getIntValue());
                hs.setUpHandled(up.path("handled").getIntValue());
                hs.setUpUnhandled(up.path("unhandled").getIntValue());

                hg.setHosts(hs);
                hostGroups.add(hg);
            }
        } catch (IOException e) {
            System.err.println("Fatal JSON Parsing error: " + e.getMessage());
            e.printStackTrace();
        }

        return hostGroups;
    }</code></pre>
<p>What we do here is use Jackson JSON parser (org.codehaus.jackson) to first lookup the hostgroup node in the JSON response and then with this node we go on to lookup the list node. When we have the list node we can step through all the host groups and setup corresponding HostGroup objects. The HostStatus object is also setup by looking up the hosts node for each hostgroup node.</p>
<p>Here is a snippet of the JSON structure that we are talking about:</p>
<pre><code>{"hostgroup":{
...
"list":[
        {
            "hostgroup_id":"7",
            "hosts":{
</code></pre>
<p>Finally we also need to implement the callOpsview method that will use Apache Commons HTTPClient to call Opsview Enterprise. Here is how that is done:</p>
<pre><code>private String callOpsview(String url) {
        String username = ;
        String password =
;

        HttpClient client = new HttpClient();
	// Apache is adding basic authentication in Opsera’s installation so
// the following 2 lines are necessary
        Credentials defaultcreds = new UsernamePasswordCredentials(username, password);
        client.getState().setCredentials(AuthScope.ANY, defaultcreds);
        GetMethod getMethod = new GetMethod(url);
        getMethod.setRequestHeader("X-Username", username);
        getMethod.setRequestHeader("X-Password", password);
        getMethod.setRequestHeader("Accept", "application/json");

        try {
            int statusCode = client.executeMethod(getMethod);
            if (statusCode == HttpStatus.SC_OK) {
                String contents = getMethod.getResponseBodyAsString();
                System.out.println(contents);
                return contents;
            } else {
                System.err.println("Got Error " + statusCode +
                        " when calling Opsview API: " +url);
            }
        } catch (Exception e) {
            System.err.println("Fatal transport error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            // Release the connection.
            getMethod.releaseConnection();
        }

        return "";
    }</code></pre>
<p>When you implement this the username and password need to be specified according to the user account in your Opsview Enterprise installation. Here we can also see that we are setting the following headers:</p>
<ul>
<li><strong>X-Username</strong> – Username for Opsview user account</li>
<li><strong>X-Password</strong> – Password for Opsview user account</li>
<li><strong>Accept</strong> – indicates that we will accept and handle JSON responses (can also be set to accept XML)</li>
</ul>
<p>The Opsview Enterprise proxy server implementation also needs to be registered to a URL. We do this in the GWT module definition file as follows for testing purposes:</p>
<pre><code>  &lt;!— Setup the Opsview Server Proxy Service for hosted mode testing --&gt;
    &lt;servlet path="/opsviewproxy" class=
			"com.<your package path>.server.OpsviewServerProxyServiceImpl" /&gt;</code></pre>
<p>And as follows for the real web application, open up web.xml and add:</p>
<pre><code>      &lt;servlet&gt;
      &lt;servlet-name&gt;opsviewServerProxyServiceServlet&lt;/servlet-name&gt;
      &lt;servlet-class&gt;com.&lt;your package&gt;.server.OpsviewServerProxyServiceImpl&lt;/servlet-class&gt;
    &lt;/servlet&gt;

    &lt;servlet-mapping&gt;
      &lt;servlet-name&gt;opsviewServerProxyServiceServlet&lt;/servlet-name&gt;
      &lt;url-pattern>/dashboardApp/opsviewproxy&lt;/url-pattern&gt;
    &lt;/servlet-mapping&gt;</code></pre>
<h4>The Mini-Dashboard Client</h4>
<p>The client is implemented as Google Web Toolkit application. So first create the main entry point / class for the application:</p>
<pre><code>public class DashboardApp implements EntryPoint {

    public void onModuleLoad() {
        OpsviewServerProxyServiceAsync service = (OpsviewServerProxyServiceAsync)
                GWT.create(OpsviewServerProxyService.class);
        Registry.register(OpsviewServerProxyService.SERVICE_NAME, service);

        LayoutContainer container = new LayoutContainer();
        final BorderLayout layout = new BorderLayout();
        container.setLayout(layout);
        container.setWidth(1024);
        container.setHeight(768);
        container.add(createMainMenuAndToolbarPanel(), createMainMenuAndToolbarLayoutData());
        container.add(new Dashboard(2), createDashboardLayoutData());
        container.add(createBottomStatusBar(), createBottomStatusBarLayoutData());

        RootPanel.get().add(container);
    }</code></pre>
<p>This creates the stub that handles communication with the Opsview Enterprise proxy server and registers this stub object locally. Then the layout for the GXT application is setup. Now create the individual panels and layout data:</p>
<pre><code>    private ContentPanel createMainMenuAndToolbarPanel() {
        ContentPanel mainMenuAndToolbarPanel = new ContentPanel();
        mainMenuAndToolbarPanel.setTopComponent(new MainMenu());
        mainMenuAndToolbarPanel.setHeading("Opsview Dashboard...");
        return mainMenuAndToolbarPanel;
    }

    private ContentPanel createBottomStatusBar() {
        ContentPanel panel = new ContentPanel();
        ToolBar toolbar = new ToolBar();
        panel.setHeaderVisible(false);
        Label opsviewVersion = new Label();
        opsviewVersion.setStyleName("Arial");
        opsviewVersion.setText("Opsview Dashboard Example");
        Label loggedInInfo = new Label();
        loggedInInfo.setText("Logged in as Opsview Admin");
        toolbar.add(opsviewVersion);
        toolbar.add(new FillToolItem());
        toolbar.add(loggedInInfo);
        toolbar.setHeight(25);
        panel.add(toolbar);
        return panel;
    }

    private BorderLayoutData createMainMenuAndToolbarLayoutData() {
        BorderLayoutData breadcrumbsLayoutData =
new BorderLayoutData(Style.LayoutRegion.NORTH, 50);
        breadcrumbsLayoutData.setHideCollapseTool(true);
        breadcrumbsLayoutData.setMargins(new Margins(5, 5, 0, 5));
        return breadcrumbsLayoutData;
    }

    private BorderLayoutData createDashboardLayoutData() {
        BorderLayoutData data = new BorderLayoutData(Style.LayoutRegion.CENTER);
        data.setMargins(new Margins(0, 5, 0, 5));
        return data;
    }

    private BorderLayoutData createBottomStatusBarLayoutData() {
        BorderLayoutData breadcrumbsLayoutData =
new BorderLayoutData(Style.LayoutRegion.SOUTH, 25);
        breadcrumbsLayoutData.setHideCollapseTool(true);
        breadcrumbsLayoutData.setMargins(new Margins(0, 5, 0, 5));
        return breadcrumbsLayoutData;
    }</code></pre>
<p>We are also going to need a menu and you can implement it if you like as it is not going to be used in this example. Here is a starting point:</p>
<pre><code>public class MainMenu extends ToolBar {
   public MainMenu() {....
   }
}</code></pre>
<p>Last thing we need for the client is the actual Dashboard class that will call the Opsview Enterprise proxy and draw the pie chart with host group status:</p>
<pre><code>public class Dashboard extends Portal {
   public static final String OFC_FLASH_LOCATION = "dashboardApp/chart/open-flash-chart.swf";

   private Chart m_hostGroupsChart;
   private List m_currentHostGroups;

   public Dashboard(int columns) {
        super(columns);
        setBorders(true);
        setStyleAttribute("backgroundColor", "white");
        setColumnWidth(0, .50);
        setColumnWidth(1, .50);

        add(createHostGroupsChartPortlet(), 1);
        loadHostGroupsChartPortlet();
   }</code></pre>
<p>Here we start off by implementing the Dashboard as a GXT Portal. We add one Portlet that will show the Host Group status pie chart. Then we call a method that will load the pie chart.</p>
<p>Next implement the method that creates the Host Group chart portlet as follows:</p>
<pre><code>   private Portlet createHostGroupsChartPortlet() {
        Portlet portlet = new Portlet();
        portlet.setHeading("Status - Host Groups Hierarchy");
        portlet.setLayout(new FitLayout());
        portlet.setHeight(400);

        String url = !isExplorer() ? "../../" : "";
        url += OFC_FLASH_LOCATION;
        m_hostGroupsChart = new Chart(url);
        m_hostGroupsChart.setBorders(true);

        //portlet.setIcon(ICONS.hostgroup());
        portlet.setTopComponent(createChartPortletToolBar());
        portlet.add(m_hostGroupsChart);

        configurePortlet(portlet);

        return portlet;
    }</code></pre>
<p>First we create the portlet and set its layout and heading etc. We then create a chart component and add it to the portlet. The chart component uses Flash to draw the chart so we need to supply the chart component with the URL for the Flash component. We also create a little toolbar for the portlet to show that you could implement configuration of what you want to see this way:</p>
<pre><code>    private ToolBar createChartPortletToolBar() {
        ToolBar toolbar = new ToolBar();

        Button searchButton = new Button("Config");
        //searchButton.setIcon(ICONS.properties());
        searchButton.setBorders(true);
        searchButton.setToolTip("Click here to configure the host group status chart");

        toolbar.setBorders(true);
        toolbar.setHeight(25);
        toolbar.add(new FillToolItem());
        toolbar.add(searchButton);

        return toolbar;
    }</code></pre>
<p>The loading of the Chart Portlet is done as follows by calling the getHostGroups method of the Opsview Enterprise proxy server stub and implementing the onSuccess and onFailure methods:</p>
<pre><code>    private void loadHostGroupsChartPortlet() {
        // Get the Opsview Server Proxy Service from the Registry
        final OpsviewServerProxyServiceAsync opsviewServerProxyService =
		(OpsviewServerProxyServiceAsync)
                Registry.get(OpsviewServerProxyService.SERVICE_NAME);

        // Make sure we got the service
        if (opsviewServerProxyService == null) {
            showInfoMsg("Opsview Server Proxy service could not be detected!");
        } else {
            opsviewServerProxyService.getHostGroups(new AsyncCallback
&gt;() {
                public void onSuccess(List hostGroups) {
                    m_currentHostGroups = hostGroups;

                    ChartModel cm = new ChartModel("Hosts by Host Group",
                            "font-size: 14px; font-family: Verdana; text-align: center;");
                    cm.setBackgroundColour("#fffff5");

                    PieChart pie = new PieChart();
                    pie.addChartListener(listener);
                    pie.setAlpha(0.5f);
                    pie.setTooltip("#label#
#percent#");
                    pie.setColours("#ff0000", "#00aa00", "#0000ff", "#ff9900", "#ff00ff");

                    for (HostGroup hostGroup : hostGroups) {
                        int totalHosts = hostGroup.getHosts().getTotal();
                        String name = hostGroup.getName();
                        pie.addSlices(new PieChart.Slice(totalHosts, name + " (" +
totalHosts + ")", name));
                    }

                    cm.addChartConfig(pie);

                    m_hostGroupsChart.setChartModel(cm);
                }

                public void onFailure(Throwable throwable) {
			throw new RuntimeException(throwable);
                    showInfoMsg("Opsview Server Proxy
getHostGroups call failed: " + throwable.getMessage());
                }
            });
        }
    }</code></pre>
<p>When the Opsview Enterprise server proxy responds we store the current host groups in a global variable so we can access it later on when the user clicks on one of the slices. We then do not have to call the server proxy as we have the HostStatus for the clicked on host group slice. We then create the PieChart and add a slice for each host group that was returned. A listener is also set for the pie chart and it will load the host status pie chart. The listener implementation looks like this:</p>
<pre><code>    private ChartListener listener = new ChartListener() {
        public void chartClick(ChartEvent ce) {
            PieChart.Slice hostGroup = (PieChart.Slice) ce.getDataType();
            loadHostStatusChartPortlet(hostGroup.getText());
            Info.display("Chart Clicked", "You selected {0}.", "" + ce.getValue() +
", " + hostGroup.getLabel());
        }
    };</code></pre>
<p>The Host Status Chart is loaded like this:</p>
<pre><code>    private void loadHostStatusChartPortlet(String hostGroupName) {
        ChartModel cm = new ChartModel("Host Status for Host Group " + hostGroupName,
                "font-size: 14px; font-family: Verdana; text-align: center;");
        cm.setBackgroundColour("#fffff5");

        PieChart pie = new PieChart();
        pie.addChartListener(listener);
        pie.setAlpha(0.5f);
        pie.setTooltip("#label#
#percent#");
        pie.setColours("#ff0000", "#00aa00", "#0000ff", "#ff9900", "#ff00ff");

        for (HostGroup hostGroup : m_currentHostGroups) {
            if (hostGroupName.equals(hostGroup.getName())) {
                HostStatus hostStatus = hostGroup.getHosts();
                pie.addSlices(new PieChart.Slice(hostStatus.getDownUnhandled(),
"Down - unhandled (" + hostStatus.getDownUnhandled() + ")", "Down - unhandled"));
                pie.addSlices(new PieChart.Slice(hostStatus.getUpHandled(),
"Up - handled (" + hostStatus.getUpHandled() + ")", "Up - handled"));
                pie.addSlices(new PieChart.Slice(hostStatus.getDownHandled(),
"Down - handled (" + hostStatus.getDownHandled() + ")", "Down - handled"));
                pie.addSlices(new PieChart.Slice(hostStatus.getUpUnhandled(),
"Up - unhandled (" + hostStatus.getUpUnhandled() + ")", "Up - unhandled"));
            }
        }

        cm.addChartConfig(pie);

        m_hostGroupsChart.setChartModel(cm);
    }</code></pre>
<p>When the Host Status chart is loaded it is not possible to go back to the Host Groups chart. There is no menu item for that or any implemented code to do that. If you like go ahead and implement this solution by yourself.</p>
<p>The little toolbar in the Chart Portlet is implemented like this:</p>
<pre><code>    private void configurePortlet(final ContentPanel panel) {
        panel.setCollapsible(true);
        panel.setAnimCollapse(false);
        panel.getHeader().addTool(new ToolButton("x-tool-gear"));
        panel.getHeader().addTool(
                new ToolButton("x-tool-close", new SelectionListener() {
                    @Override
                    public void componentSelected(IconButtonEvent ce) {
                        panel.removeFromParent();
                    }
                }));
    }</code></pre>
<p>The last couple of methods are helpers:</p>
<pre><code>    public static boolean isExplorer() {
        String test = Window.Location.getPath();
        if (test.indexOf("pages") != -1) {
            return false;
        }
        return true;
    }

    public static void showInfoMsg(String msg) {
        MessageBox box = new MessageBox();
        box.setButtons(MessageBox.OK);
        box.setIcon(MessageBox.INFO);
        box.setTitle("Information");
        box.setMessage(msg);
        box.show();
    }
}</code></pre>
<h4>Running the Mini-Dashboard Client</h4>
<p>To run this application after installing it under for example Tomcat we use a URL like this:</p>
<p><code>http://localhost:8080/opsviewui/dashboardApp.html</code></p>
<p>When we run this we should see a Host Groups Chart looking something like this:</p>
<p><a class="lightbox" title="HostGroupsPieChart" href="http://labs.opsview.com/wp-content/uploads/2010/05/HostGroupsPieChart.png"><img class="aligncenter size-full wp-image-372" title="HostGroupsPieChart" src="http://labs.opsview.com/wp-content/uploads/2010/05/HostGroupsPieChart.png" alt="" width="400" /></a></p>
<p>Clicking on one of the Host Groups will show Host status as follows:</p>
<p><a class="lightbox" title="HostStatusPieChart" href="http://labs.opsview.com/wp-content/uploads/2010/05/HostStatusPieChart.png"><img class="aligncenter size-full wp-image-373" title="HostStatusPieChart" src="http://labs.opsview.com/wp-content/uploads/2010/05/HostStatusPieChart.png" alt="" width="400" /></a></p>
<h4>Resources</h4>
<p>The Java libraries that were used in this example can be found here:</p>
<ul>
<li>GWT &#8211; <a href="http://code.google.com/webtoolkit/">http://code.google.com/webtoolkit/</a></li>
<li>GXT &#8211; <a href="http://www.extjs.com/products/gwt/">http://www.extjs.com/products/gwt/</a></li>
<li>Apache Commons HTTP Client – <a href="http://www.extjs.com/products/gwt/">http://hc.apache.org/httpclient-3.x/</a></li>
<li>Jackson JSON Parser &#8211; <a href="http://jackson.codehaus.org/">http://jackson.codehaus.org/</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/05/gwtgxt-dashboard-primer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Business Continuity in the cloud era</title>
		<link>http://labs.opsview.com/2010/04/business-continuity-in-the-cloud-era/</link>
		<comments>http://labs.opsview.com/2010/04/business-continuity-in-the-cloud-era/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 15:40:16 +0000</pubDate>
		<dc:creator>rbramley</dc:creator>
				<category><![CDATA[business systems]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[continuity]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Puppet]]></category>
		<category><![CDATA[SaaS]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/?p=337</guid>
		<description><![CDATA[
			
				
			
		In the light of the recent events at a BT network centre in Paddington (London, UK), where a series of compound failures caused a massive outage with huge knock-on effects, I’m sure many businesses are taking another look at their own (and their suppliers) availability with a view to beefing up business continuity.

Within the spirit [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F04%2Fbusiness-continuity-in-the-cloud-era%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F04%2Fbusiness-continuity-in-the-cloud-era%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>In the light of the recent events at a <a href="http://www.theregister.co.uk/2010/03/31/burne_house_burns/">BT network centre</a> in Paddington (London, UK), where a series of compound failures caused a massive outage with huge knock-on effects, I’m sure many businesses are taking another look at their own (and their suppliers) availability with a view to beefing up business continuity.<br />
<span id="more-337"></span><br />
Within the spirit of continuous improvement this should be taken as an opportunity to improve the overall ’system’ rather than finger pointing.</p>
<h3>What is business continuity?</h3>
<p>Quite simply, business continuity is how you can stay in business (and meet your customers demands) in the wake of a disaster (whether a localised incident such as flooding or a further reaching issue like a terrorist attack). Your plan will typically cover all the business critical functions, systems and data.</p>
<h3>So what is high availability?</h3>
<p>High availability (or HA) is the way that system designers ensure ‘operational continuity’ of a system. This will typically involve ensuring that the system has no single point of failure (and for really fault tolerant systems there should also be no single point of recovery).</p>
<p>One of the common mistakes that people often make is getting confused between availability and scalability. Scaling is usually described in terms of vertical (bigger boxes) or horizontal (lots of boxes) scaling &#8211; horizontally scaled systems often have a degree of high availability whereas vertical scaling is potentially a bigger risk as you lose more capability when you lose a bigger box.</p>
<h3>How do redundancy and diversity fit into the picture?</h3>
<p>The phrase ‘belt &amp; braces’ describes this nicely &#8211; two different ways of achieving the same goal.</p>
<p>One example might be having multiple (diverse) suppliers provide network connections to a building; in data centres it is common to have a (redundant) backup generator &#8211; though this is usually coupled with a battery-based UPS (uninterruptible power supply) to provide seamless failover.</p>
<h3>But we’ve got this new fangled Cloud thing…</h3>
<p>Let’s attempt to clarify what ‘cloud computing’ means &#8211; there are a number of different types of on-demand services that run ‘in the Cloud’:</p>
<ul>
<li>IaaS &#8211; Infrastructure as a Service</li>
<li>PaaS &#8211; Platform as a Service</li>
<li>SaaS &#8211; Software as a Service</li>
<li>IaaS &#8211; Infrastructure as a Service</li>
</ul>
<p>This category splits into:</p>
<ul>
<li>Virtual Private Server (VPS) providers such as Slicehost allow businesses to rent a fixed size virtual server on demand for a set monthly fee</li>
<li>Elastic computing providers such as Amazon EC2 allow businesses to have a virtual machine image that can grow to meet requested demand (vertical scaling).</li>
</ul>
<h4>PaaS &#8211; Platform as a Service</h4>
<p>Platform as a Service providers offer application-hosting platforms (again these can be of fixed-size or scalable) so that application developers can focus on adding business value rather than needing to worry about the underlying infrastructure. Google AppEngine is one example of this approach for Java &amp; Python applications.</p>
<h4>SaaS &#8211; Software-as-a-Service</h4>
<p>Software-as-a-Service provides access to business software over the Internet for a set monthly fee. This can either be multi-tenanted, where multiple companies share a system, or with separate installations per customer. One of the major SaaS success stories is SalesForce.com who started with a CRM on-demand offering.</p>
<h4>Public vs. private vs. hybrid clouds</h4>
<p>We’ve already discussed public cloud offerings above; a private cloud is typically an on-premise or a dedicated outsourced managed cloud platform (e.g. using the open source Eucalyptus or VMWare vCloud). A hybrid cloud is where a private cloud is used in conjunction with one or more external cloud providers.</p>
<p>There is also the notion of community clouds with varying definitions:</p>
<ol>
<li>whereby similar organisations pool resources into a shared multi-tenant cloud (though I prefer to describe this as a shared private cloud or a restricted cloud e.g. Google’s ‘GovCloud’)</li>
<li>a decentralised peer-to-peer cloud utilising spare computing power (and bandwidth) of internet-connected computers.</li>
</ol>
<h3>Deployment models</h3>
<p>Assuming that your organisation can operate its business critical systems and store business critical data in the Cloud, then there are a number of possible deployment models to consider, chiefly:</p>
<ol>
<li>Backup</li>
<li>Failover</li>
<li>Active-active</li>
<li>Cloud as a backup</li>
</ol>
<h4>Cloud as a backup</h4>
<p>In this model, data plus the necessary software packages and configuration data (e.g. CMDB configuration data) for critical systems are backed up to 1 (or ideally more) hosts in the cloud. In the event of a disaster at the primary operating location, new virtual servers are commissioned and the configuration management tool (e.g. Puppet) is used to provision the servers with the appropriate software.</p>
<p>Recovery time is dependent upon the time taken to commission/provision the new virtual servers and perform any data transfer (or decryption). Database/file replication techniques can help to reduce the time to recover.</p>
<h4>Failover to the cloud</h4>
<p>For this model, there are pre-configured server instances in the cloud running the business critical systems combined with data replication. In the event of a disaster at the primary operating location, the systems are failed over to the cloud instance (this can be manual or use an automated ‘global load balancer’).</p>
<h4>Active-active</h4>
<p>This form of hybrid cloud is fundamentally the same as the failover option, however there are more complexities involved in the data synchronisation, session failover etc.</p>
<h3>The Cloud</h3>
<p>So how can the Cloud help with planning Business Continuity activities?</p>
<h4>1 &amp; 2: People &amp; premises</h4>
<p>For knowledge worker businesses, the Internet and widespread availability of broadband has increased the prevalence of distributed home workers. With a distributed workforce and cloud-based systems these two are items of less concern; it remains a practical option to have staff work from home (or indeed a temporary serviced office) in the event of a disaster accessing systems running in the cloud.</p>
<p>For a true &#8216;belt and braces&#8217; approach, 3G/HSDPA mobile broadband dongles can be used to provide a secondary Internet connection for home workers should their main internet connection be unavailable.</p>
<h4>3: Technology</h4>
<p>Infrastructure-as-a-Service is very compelling for providing a Business Continuity strategy for data centre(s) using the deployment models outlined above. Furthermore, VoIP services can provide sufficient telephony cover for small-medium businesses.</p>
<h4>4: Information</h4>
<p>Cloud-based services (whether computing based or dedicated storage solutions e.g. Amazon S3) can aid your business in having current data stored confidentially and readily available in the event of a disaster. Some regulated organisations may have to consider whether the service provider can store data within the appropriate territory/jurisdiction. Data integrity is a further consideration for more complex system environments (particularly with the backup approach) &#8211; this needs to be taken into account for solution design / recovery procedures.</p>
<h3>Conclusion</h3>
<p>So we’ve established that cloud computing can alleviate some of the business continuity execution effort from your business, however you still need to plan properly (what happens if key personnel are unavailable; how will equipment &amp; supplies be sourced) and perform due diligence on service providers to ensure that their SLAs and DR plans align with your needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/04/business-continuity-in-the-cloud-era/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mocking an SNMP agent and developing SNMP checks</title>
		<link>http://labs.opsview.com/2010/02/mocking-an-snmp-agent-and-developing-snmp-checks/</link>
		<comments>http://labs.opsview.com/2010/02/mocking-an-snmp-agent-and-developing-snmp-checks/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 17:02:16 +0000</pubDate>
		<dc:creator>Duncan Ferguson</dc:creator>
				<category><![CDATA[Opsview]]></category>
		<category><![CDATA[checks]]></category>
		<category><![CDATA[SNMP agent]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/2010/02/mocking-an-snmp-agent-and-developing-snmp-checks.html</guid>
		<description><![CDATA[
			
				
			
		We regularly get customer requests to write snmp checks for their devices but a problem we have is we don&#8217;t necessarily get access to those devices &#8211; in the past we have used output from an snmpwalk command and written the check scripts as best we can, and then shipped the new check to the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F02%2Fmocking-an-snmp-agent-and-developing-snmp-checks%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2010%2F02%2Fmocking-an-snmp-agent-and-developing-snmp-checks%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>We regularly get customer requests to write snmp checks for their devices but a problem we have is we don&#8217;t necessarily get access to those devices &#8211; in the past we have used output from an snmpwalk command and written the check scripts as best we can, and then shipped the new check to the customer for testing.<br />
<span id="more-76"></span><br />
This doesn&#8217;t sit very well with me as I don&#8217;t think this gives a good customer experience.  Why should the customer do the testing when they have paid us to write the code?  How much time and effort has the customer got to put in to then get all debug data back to us to improve the check?</p>
<p>Well, there is a better way, by using <a title="CPAN - SNMP::Persist" href="http://search.cpan.org/%7Eanias/SNMP-Persist-0.05/">SNMP::Persist</a></p>
<p>The snmpd agent on most unix type systems can be extended such that a script can be used to respond to all snmp requests for a particular oid tree.  This can be done by creating a script such as this:<br />
<span style="font-family: Courier;"> </span></p>
<pre><span style="font-size: 11px; font-family: Courier;">#!/usr/bin/perl</span>
<span style="font-size: 11px; font-family: Courier;">use strict;</span>
<span style="font-size: 11px; font-family: Courier;">use warnings;</span>
<span style="font-size: 11px; font-family: Courier;">use SNMP::Persist qw(&amp;define_oid &amp;start_persister &amp;define_subtree);</span>
<span style="font-size: 11px; font-family: Courier;">define_oid(".1.3.6.1.4.1.123.123"); # base of OID tree</span>
<span style="font-size: 11px; font-family: Courier;">start_persister();</span>
<span style="font-size: 11px; font-family: Courier;">while (1) {</span>
<span style="font-size: 11px; font-family: Courier;"> # define all sub OIDS within the base to respond to</span>
<span style="font-size: 11px; font-family: Courier;"> define_subtree(</span>
<span style="font-size: 11px; font-family: Courier;"> { '1.1' =&gt; [ 'STRING', 'string value' ],</span>
<span style="font-size: 11px; font-family: Courier;"> '1.2' =&gt; [ 'INTEGER', 9 ],</span>
<span style="font-size: 11px; font-family: Courier;"> '1.3' =&gt; [ 'Counter32', 342.3 ],</span>
<span style="font-size: 11px; font-family: Courier;"> }</span>
<span style="font-size: 11px; font-family: Courier;"> );</span>
<span style="font-size: 11px; font-family: Courier;"> sleep 300;</span>
<span style="font-size: 11px; font-family: Courier;">}</span></pre>
<p>and then add an entry to your snmpd.conf such as</p>
<p><span style="font-size: 13px; font-family: Arial;"><span style="font-size: 11px; font-family: Courier;">pass_persist .1.3.6.1.4.1.123.123 /path/to/perl/script</span></span></p>
<p>After restarting your snmpd agent you should now be able to query the oids</p>
<p style="font-size: 11px; font-family: Courier;"><span style="font-size: 13px;">$ snmpwalk -v2c -c public localhost </span><span style="font-size: 13px;"><span style="font-size: 11px;">.1.3.6.1.4.1.123.123</span></span></p>
<p>While the above isn&#8217;t in a fit state to use for <a title="Description of TAP tests" href="http://en.wikipedia.org/wiki/Test_Anything_Protocol">TAP tests</a>, it shouldn&#8217;t take too much imagination to extend it by using &#8216;pass&#8217; instead of &#8216;pass_persist&#8217; in NET::SNMP in a small server started when running the project tests. </p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2010/02/mocking-an-snmp-agent-and-developing-snmp-checks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Faster Nagios CGIs</title>
		<link>http://labs.opsview.com/2009/12/faster-nagios-cgis/</link>
		<comments>http://labs.opsview.com/2009/12/faster-nagios-cgis/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 12:43:06 +0000</pubDate>
		<dc:creator>tonvoon</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CGI]]></category>
		<category><![CDATA[Nagios]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/2009/12/faster-nagios-cgis.html</guid>
		<description><![CDATA[A patch from Jonathan Kamens speeds up the status CGIs. Added into CVS!
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2009%2F12%2Ffaster-nagios-cgis%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2009%2F12%2Ffaster-nagios-cgis%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>The great thing about open source software is the unexpected contributions you may suddenly receive. Jonathan Kamens of <a href="http://advent.com">Advent Software</a> sent a patch into the <a  href="http://article.gmane.org/gmane.network.nagios.devel/7170">nagios-devel mailing list</a> about a speed up to the status CGIs. He identified, using <a href="http://wiki.nagios.org/index.php/Nagios_Core_Developer_Guidelines#Profiling">gprof</a>, that status.cgi was taking the most time in the sorting routine of downtimes and comments.<br />
<span id="more-77"></span></p>
<p>
His patch was simple and effective &#8211; instead of trying to sort the order of the objects every time, just do it once at the end.</p>
<p>
We&#8217;ve been looking at the patch and produced some statistics of the old behaviour versus the new behaviour. The times can be found from running the t/660status-downtimes-comments.t perl script within the Nagios code.</p>
<style type="text/css">table.padded td, table.padded th { padding: 0.4em 0.6em 0.4em 0.6em;} table.padded td { padding-bottom: 0 }</style>
<p><center><br />
<table class="padded" style="font-size: 7pt; border: 1px solid black;">
<tr style="background-color: #99ccff; border: 1px solid #999999; text-align: left">
<th>Comments and downtimes</th>
<th>Nagios 3.2.0</th>
<th>With patch</th>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1000</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>10000</td>
<td>14</td>
<td>1</td>
</tr>
<tr>
<td>100000</td>
<td>1680 (= 28mins &#8211; and it didn&#8217;t finish!)</td>
<td>53</td>
</tr>
</table>
<p></center></p>
<p>For 100000 comments, the status.cgi was taking 28 minutes just reading the current status data &#8211; and in the test it didn&#8217;t actually finish! With Jonathan&#8217;s patch, this reduced down to 53 seconds.</p>
<p>There&#8217;s probably still some work to do as an increase of 10 from 10000 to 100000 increases the execution time by a factor of 53, but this is a good start!</p>
<p><a href="http://labs.opsview.com/wp-content/uploads/2009/12/num_sec_execute.png"><img src="http://labs.opsview.com/wp-content/uploads/2009/12/num_sec_execute-300x221.png" alt="" title="num_sec_execute" width="300" height="221" class="aligncenter size-medium wp-image-247" /></a></p>
<p>There seem to be some work in looking at alternative ways of getting status data, including <a href="http://nagios.org/download/addons">NDOutils database</a> backend, or <a href="http://www.mathias-kettner.de/checkmk_livestatus.html">mklivestatus</a> to get the information directly from the Nagios daemon.</p>
<p>For <a href="http://www.opsview.com/">Opsview</a>, we&#8217;re betting on the database backend, because there are other advantages (arbitrarily complex search criteria, historical information, separation of work from Nagios instance), although we like to provide the old style Nagios CGIs as well, so we&#8217;ll be adding this patch to Opsview. Funnily enough, the Nagios users get the patch earlier than the Opsview users do! But that&#8217;s all part of working with open source software and improving the tools for everyone.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2009/12/faster-nagios-cgis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nagios scheduling bug</title>
		<link>http://labs.opsview.com/2009/10/nagios-scheduling-bug/</link>
		<comments>http://labs.opsview.com/2009/10/nagios-scheduling-bug/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 10:42:12 +0000</pubDate>
		<dc:creator>tonvoon</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[Nagios]]></category>
		<category><![CDATA[scheduling]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/2009/10/nagios-scheduling-bug.html</guid>
		<description><![CDATA[A Nagios bug is causing problems with incorrect scheduling of checks following timezone changes. We provide a workaround script
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2009%2F10%2Fnagios-scheduling-bug%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2009%2F10%2Fnagios-scheduling-bug%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>
This Monday morning, we got lots of calls from our users where Opsview slave systems running Nagios were raising freshness alerts because checks weren&#8217;t being run within their specified period.<br />
<span id="more-78"></span></p>
<p>
Suspiciously, this was also the weekend where clocks went back one hour due to daylight savings changes.</p>
<p>
Hosts and services were meant to be scheduled in 5 minute intervals on our internal system, were marked to be run on 26th October at 23:00 instead of 25th October.</p>
<p>
So there seems to be a problem within Nagios where the rescheduling has been adversely affected by the clock changes. We expect this will affect any countries still using Daylight Saving Time when they revert over the next few weeks.</p>
<p>
This affects all active checks on all Nagios systems, not just a distributed environment.</p>
<p>
We&#8217;re going to investigate this more deeply within Nagios because this definitely worked fine over the last few years. From the Nagios mailing lists, it seems to be in Nagios 3.2.0, but not Nagios 3.0.6. However, it has affected Opsview systems that run Nagios 3.0.6 and I think it is due to this <a href="https://secure.opsera.com/svn/opsview/branches/BRAN-3.1/opsview-base/patches/nagios_initialise_isdst.patch">patch which we brought forward from Nagios 3.2.0</a>.</p>
<p>
The workaround at the moment is to recheck all your hosts and services again. In Opsview 3.3.2, you can use the Mass recheck functionality.</p>
<p>
Alternatively, you can use this <a href="https://secure.opsera.com/svn/opsview/trunk/opsview-core/bin/recheck_all_hosts_services">script</a> which we&#8217;ve written. Download it, put it in /usr/local/nagios/bin and run it. It will submit a SCHEDULE_HOST_CHECK and SCHEDULE_HOST_SVC_CHECKS for all hosts in your system, using the objects.cache file to get all the host names. There is a random 5 minute difference applied, so that not all the checks will run at the same time.</p>
<p>
You may need to adapt the script if you use different paths, but it will work on all Opsview systems.</p>
<p>
In a distributed environment, you only need to run this on the Opsview master server and the requests will be sent to slaves automatically.</p>
<p>
<b>Update</b>: We&#8217;ve been testing this and have found the following results:</p>
<ul>
<li>If you use a timezone such as Europe/London, then the bug was triggered between 2009-10-25 23:00:00 and 2009-10-26 00:00:00
<li>If you use a timezone such as America/New_York, then the bug will be triggered between 2009-11-01 23:00:00 and 2009-10-02 00:00:00
<li>If you use UTC, then the bug will <b>not</b> be triggered
</ul>
<p>Basically, it occurs on the day when the time for the nagios daemon goes back an hour.</p>
<p>We&#8217;ve managed to recreate the bug in a libtap test, so we&#8217;re working on a fix to Nagios.</p>
<p>
<b>Update</b>: We&#8217;ve applied a patch to core Nagios for this fix. Using our tests, we&#8217;ve found that the bug happens only during the 23:00 to 00:00 on the day when clocks go back in time.</p>
<p>Since it is an automated test, we ran the test checking every half an hour, a million times and &#8230; it started to fail &#8230; in 2038. This is a <a href="http://en.wikipedia.org/wiki/Year_2038_problem">known Year 2038 problem</a>. I think there will be changes to Nagios before then <img src='http://labs.opsview.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2009/10/nagios-scheduling-bug/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Improving Nagios</title>
		<link>http://labs.opsview.com/2009/10/improving-nagios/</link>
		<comments>http://labs.opsview.com/2009/10/improving-nagios/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 22:18:22 +0000</pubDate>
		<dc:creator>tonvoon</dc:creator>
				<category><![CDATA[Nagios]]></category>
		<category><![CDATA[Opsview]]></category>

		<guid isPermaLink="false">http://labs.opsview.com/2009/10/improving-nagios.html</guid>
		<description><![CDATA[Opsview nagios fork
]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Flabs.opsview.com%2F2009%2F10%2Fimproving-nagios%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Flabs.opsview.com%2F2009%2F10%2Fimproving-nagios%2F&amp;style=normal" height="61" width="50" />
			</a>
		</div><p>At the heart of Opsview is the <a href="http://www.nagios.org/">Nagios</a> monitoring engine. One of the policies we have with Opsview is to keep the number of changes of our dependent software as low as possible. We do this by keeping track of all the patches we apply and pushing these back upstream (though recently we haven&#8217;t had as much time as we&#8217;d like&#8230;).<br />
<span id="more-79"></span></p>
<p>
Looking back over the last year, we&#8217;ve compiled a report of the number of patches to Nagios.</p>
<p>
<style type="text/css">
table.padded td, table.padded th { padding: 0.4em 0.6em 0.4em 0.6em;}<br />
table.padded td { padding-bottom: 0 }<br />
</style>
<p></p>
<table class="padded" style="font-size: 7pt; border: 1px solid black;"></p>
<tr style="background-color: #99ccff; border: 1px solid #999999; text-align: left">
<th>Date</th>
<th>Opsview version</th>
<th>Nagios version</th>
<th>Patches</th>
</tr>
<p></p>
<tr>
<td>Oct 2008</td>
<td>Opsview 2.14</td>
<td>Nagios 2.10</td>
<td>40</td>
</tr>
<p></p>
<tr>
<td>Feb 2009</td>
<td>Opsview 3.0</td>
<td>Nagios 3.0.6</td>
<td>30</td>
</tr>
<p></p>
<tr>
<td>Sep 2009</td>
<td>Opsview 3.3.1</td>
<td>Nagios 3.0.6</td>
<td>34</td>
</tr>
<p></p>
<tr>
<td>Oct 2009</td>
<td>Opsview 3.3.2</td>
<td>Nagios 3.2.0</td>
<td>22</td>
</tr>
<p>
</table>
</p>
<p>
Or if you prefer a graph:</p>
<p>
<img src="http://opsview-blog.opsera.com/.a/6a00d83451f81d69e20120a62a95d0970c-pi" alt="nagiospatches.png" border="0" width="283" height="168" /></p>
<p>
So you can see that we&#8217;ve been working hard on reducing the number of changes to Nagios. Are we stripping functionality out of Opsview? Not at all &#8211; we&#8217;re trying to push as much back to the main project as we can. There was a big drop down from 40 to 30 when we upgraded Opsview to Nagios 3, mainly due to all the patches <a href="http://opsview-blog.opsera.com/dotorg/2007/09/nagios-patch-da.html">we already informed</a> the project about.</p>
<p>
There was a small increase between February to September 2009 &#8211; this was due to customer changes that we needed to apply. But there was another major drop again when we upgraded to Nagios 3.2.0 &#8211; this time due to my involvement in the <a href="http://opsview-blog.opsera.com/dotorg/2009/08/opsviews-relationship-with-the-nagios-project.html">core Nagios team</a>. We&#8217;ll do a similar review on NDOutils since my colleague Duncan has been updating code there.</p>
<p>
So by definition, Opsview uses a forked version of Nagios. But we call this a <em>shallow fork</em>, because we&#8217;re keeping our changes to a minimum and trying to make sure that the core project benefits from the enhancements and fixes and <a href="http://tinderbox.nagios.org/nagios/status.html">testing procedures</a> that we use.</p>
<p>
We&#8217;re proud of our involvement with the Nagios project and we&#8217;re committed to making Nagios better for everyone, driving standards higher and improving core technologies.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.opsview.com/2009/10/improving-nagios/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.556 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-09-06 20:05:08 -->
<!-- Compression = gzip -->