Pimping Nagios/Icinga e-mail notifications
When Nagios sends out notification e-mails regarding failing or misbehaving hosts and services, the format and the content of the message are not hard-wired in the Nagios source code. Instead Nagios uses two command objects, one for host (connectivity) related and one for service related notifications.
The service related command object is by default defined as follows:
# 'notify-service-by-email' command definition
command_line /usr/bin/printf "%b" "***** Nagios *****
\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDE
SC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICES
TATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVI
CEOUTPUT$" | mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOST
ALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
Using external shell tools, namely /usr/bin/printf and a command line e-mail client like mail Nagios sends out service-related information dynamically derived from pre-defined macros. The e-mail bodies generated this way look like this:
***** Nagios *****
Notification Type: <notification type>
Service: <service description>
Host: <host alias>
Address: <ip address>
Additional Info: <plugin output>
Although these e-mails might provide sufficient information to start with, your internal policy for setting host aliases in the Nagios configuration can easily lead to confusion: When monitoring customer systems at censhare it happens quite frequently that these hosts sport two different hostnames: one hostname complies with the customer's naming convention, the second one with our internal one. Our policy is to use the latter as host_name (as this one is prominently used in the web interface), and to set the customer's internal host name as alias.
Now what happened? At times some of us became quite puzzled as Nagios sent e-mails concerning services on hosts with names we could not spot in the Service Detail list on the Nagios web interface. Or we remembered the customer these e-mails related to, but had to look up our internal names for the server in question.
Bad default, in my opinion, but luckily this can be fixed easily: Simply exchange the $HOSTALIAS$ strings in the command definition shown above with the $HOSTNAME$ macro.
But problems remain: When Nagios notifies you about acknowledgments the e-mail generated by the notify-service-by-email command above will neither tell you who acknowledged the problem nor include the comment this person left in the web interface.
Both details can be retrieved through Nagios macros: $SERVICEACKAUTHOR$ and $SERVICEACKCOMMENT$ respectively. But as these are empty in non-acknowledgment context and the command defined above is used in all service related notification contexts, your non-acknowledgment e-mails will contain at least some additional whitespace, or stray comments if you choose to add verbose explanations. Now why not using some ifs and elses to make it nicer? Because conditional clauses don't work, even though the definition above hints to be an ordinary shell command line.
But never say die! In almost the same manner as you can run /usr/bin/printf from a command_line statement, you can run whatever command line program you fancy, e.g. a shell-script. How to do this and how to avoid the pitfalls lurking around I'm going to cover in my next blog.
In her function as Senior System and Support Engineer at censhare Patricia Jung is responsible for system monitoring. She also earns her kudos as Linux and Unix guru and script wizard. Consequently, her blog mainly revolves around system administrator topics and tutorials. You can read Patricia’s micro blog with useful information and tips from her daily work at http://identi.ca/trish.