Jun 30

This is a summary of a question I posed on the nagios-users mailing list.

In a distributed environment, we want the slave Nagios servers to do the alerting. The Nagios documentation says that the master should do the notifications as this is the central point of control, but we think there are two major limitations:

  1. the slave is not autonomous – if the connection to the master goes, then no notifications are released
  2. with slaves in different countries and local operators, the paging of notifications shouldn’t come from a central server

However, we had a problem. Some forms of notifications should be run on the master, such as RSS (we will be releasing this separate addon next week) or helpdesk integration. Another limitation is that we only allow NSCA communication from the slaves to the master.

The mailing list came to the rescue. Thanks to all the people we got responses from.

Marc Powell suggested putting logic in the notification scripts on the slave to check if the master is up then forward to master, otherwise notify itself. But I discounted that because I’m opposed to putting “should I actually notify or not” type of logic in the notification scripts – there’s just extra code to support and this is what Nagios should be doing.

Patrick Morris has slaves set with notifications off and the slaves check whether the master is working ok. If this fails, then switch on notification on the slave. I like this idea, although it requires pager notifications on the master to be “distributed” so that it forwards requests to a slave for local dispatch. However, one possible problem was raised by Robert King in a thread called Forcing renotification of existing states, where switching on a slave’s global notifications will miss out on services that are already in a non-OK state at the time of the switchover.

After thinking about it some more, we decided to create multiple contacts per person. On the slaves, we have the usual contact called user, but with host/service-notification commands of email and pager (if desired). However, it got complicated on the master because the usual user contact had to have no notification options. We then created two extra contacts:

  • user/distprofile – for only master generated notifications
  • user/masterprofile – for the usual notifications on the master

It’s all a bit ugly. The big downside to this is that there are many more object definitions – for each contact group, we also needed to create corresponding /distprofile and /masterprofile ones too. However, since we generate the configuration, the pain is a one off.

Looking long term, we decided the solution is if Nagios implements some sort of contact profile, defined like this:

define contactprofile {
contact_name                  contact_name
contactgroups                 contactgroup_names
host_notification_options     [d,u,r,f,n]
service_notification_options  [w,u,c,r,f,n]
host_notification_commands    command
service_notification_commands command
}

The contact is left with static definitions like email and pager, while the contactprofile can be sliced and diced for specific host/services. A lovely feature of this is that you can define warnings to be sent as email, whereas criticals are paged.

But that’s some way into the future…

Tagged with:
Nagios © 1999-2011 Nagios Enterprises LLC. Nagios, the Nagios logo, and Nagios graphics are the servicemarks,
trademarks, or registered trademarks owned by Nagios Enterprises, LLC. All Rights Reserved.
Opsview © 2008-2011 Opsera Ltd. Opsview, the Opsview Logo, and Opsview graphics are the
trademarks or registered trademarks owned by Opsera Limited. All Rights Reserved.
preload preload preload