Nag nag nag nag nagios

Nagios is an extremely useful tool, until it isn’t.  Which is to say, it’s nothing but a hindrance to have nagios continue to bombard you with IMs and emails when you’re already working on the problem.

Surely you can just acknowledge the fault and shut it up…?

Well, sometimes, but it is hardly convenient to break out a Firefox session when you’re attached to a serial console with your lovely secure-shell-enabled phone.  And even if you are on a DICE machine it’s a bit of a pain to have to navigate the slightly clunky Nagios UI to find the host and service you wish to silence.

I started with a dumb bash script. This hacked together the nagios acknowledgement URL:

nagack:
#!/bin/bash
[[ -z $1 ]] && { echo "Usage: `basename $0` <host> [service]"; exit 1; }
host=$1
base="https://nagios`hostname -d`/nagios/cgi-bin/cmd.cgi"
if [[ -z $2 ]]; then  # whole host
  cmd="cmd_typ=33"
else # single service-on-host
  shift; service=$*
  cmdstr="&service=`echo ${service} | sed -e 's_ _+_g'`"
  cmd="cmd_typ=34"
fi
url="${base}?${cmd}&host=${host}${cmdstr}"
if [[ -z $DISPLAY ]]; then
  w3m ${url}
else
  firefox ${url} &
fi

This saves a lot of messing around with Nagios and takes me straight to the acknowledgment confirmation page.  I’ve successfully moved more work away from the mouse; always a goal for me, but in some sense I’d taken a step backwards: to acknowledge nagios quickly I needed to now pull up a new terminal and type, ooh, a good three-or-four words before still having to come up with a relevant yet non-abusive comment to send to the nagios interface.

At this point it’s time to pull out the lovely awk…

nagparse:
#!/usr/bin/env gawk -f
/^Notification Type: PROBLEM/ { PROBLEM=1; }
/^Service:/ { SERVICE=$2; }
/^Host:/ { HOST=$2; }
SERVICE && HOST && PROBLEM { exit; }

END {
  if (SERVICE && HOST && PROBLEM) {
    system("nagios-ack "HOST" "SERVICE);
  } else if (HOST && PROBLEM) {
    system("nagios-ack "HOST);
  } else {
    print "Could not retrieve enough information." >stderr
  }
}

Yes, lots of improvements to be made here, tightening up and moving the whole thing to awk, but the point was to save time, not waste it (not too much, anyway).

So, how does this save time?  By taking this:

***** Nagios 2.9 *****

Notification Type: PROBLEM
Host: badserver
State: DOWN
Address: 129.215.123.45
Info: CRITICAL - Host Unreachable (129.215.123.45)

Date/Time: Thu Mar 4 10:46:36 GMT 2010

and turning it into this:

https://nagios.inf.ed.ac.uk/nagios/cgi-bin/cmd.cgi?cmd_typ=33&host=badserver

finally making nagios useful without being irritating.

But, um, why?

What? Not useful? Oh, don’t tell me that you’re not using alpine.  If you were you’d know that, on receiving a nagios message, you could simply hit “|” (that’s the pipe character) and pipe the message to nagparse.  At this point, w3m (once authenticated) or Firefox will appear, ready and waiting for your acknowledgement. This will probably work for any other mailer that lets you pipe entire messages to a shell script :)

Oh… why not just POST the acknowledgement comment direct to Nagios? Sadly, I’m a little way from being able to do this thanks to our otherwise exceptionally useful Cosign infrastructure: until curl gains spnego or automagical cosign-negotiation skills, the web-browser bit is necessary for the purpose of authenticating yourself. But if it saves as much time as it took to write the script, it was probably worthwhile :)

2 thoughts on “Nag nag nag nag nagios

  1. sxw

    One thing I’ve always meant to add to the Nagios/Jabber interface is the ability to silence a Jabber notification by replying to the message. I guess it would be pretty simple to do the same thing for email, too.

  2. Pingback: gdutton

Leave a Reply

Your email address will not be published. Required fields are marked *