Configure Nagios using Pallet

Basic Nagios support was recently added to pallet, and while very simple to use, this blog post should make it even simpler. The overall philosophy is to configure the nagios service monitoring definitions along with the service itself, rather than have monolithic nagios configuration, divorced from the configuration of the various nodes.

As an example, we can configure a machine to have it's ssh service, CPU load, number of processes and number of users monitored. Obviously, you would normally be monitoring several different types of nodes, but there is no difference as far as pallet is concerned.

We start by requiring various pallet components. These would normally be part of a ns declaration, but are provided here for ease of use at the REPL.

 (require   '[pallet.crate.automated-admin-user
    :as admin-user]   '[pallet.crate.iptables :as 'iptables]   '[pallet.crate.ssh :as ssh]   '[pallet.crate.nagios-config
     :as nagios-config]   '[pallet.crate.nagios :as nagios]   '[pallet.crate.postfix :as postfix]   '[pallet.resource.service :as service]) 

Node to be Monitored by Nagios

Now we define the node to be monitored. We set up a machine that has SSH running, and configure iptables to allow access to SSH, with a throttled connection rate (six connections/minute by default).

 (pallet.core/defnode monitored   []   :bootstrap [(admin-user/automated-admin-user)]   :configure [;; set iptables for restricted access
              ;; allow connections to ssh
              ;; but throttle connection requests

Monitoring of the SSH service is configured by simply adding (ssh/nagios-monitor).

Remote monitoring is implemented using nagios' nrpe plugin, which we add with (nagios-config/nrpe-client). To make nrpe accessible to the nagios server, we open the that the nrpe agent runs on using (nagios-config/nrpe-client-port), which restricts access to the nagios server node. We also add a phase, :restart-nagios, that can be used to restart the nrpe agent.

Pallet comes with some configured nrpe checks, and we add nrpe-check-load, nrpe-check-total-proces and nrpe-check-users. The final configuration looks like this:

 (pallet.core/defnode monitored   []   :bootstrap [(admin-user/automated-admin-user)]   :configure [;; set iptables for restricted access
              ;; allow connections to ssh
              ;; but throttle connection requests
              ;; monitor ssh
              ;; add nrpe agent, and only allow
              ;; connections from nagios server
              ;; add some remote checks
              (nagios-config/nrpe-check-users)]   :restart-nagios [(service/service
                    :action :restart)]) 

Nagios Server

We now configure the nagios server node. The nagios server is installed with (nagios/nagios "nagiospwd"), specifying the password for the nagios web interface, and add a phase, :restart-nagios, that can be used to restart nagios.

Nagios also requires a MTA for notifications, and here we install postfix. We add a contact, which we make a member of the "admins" contact group, which is notified as part of the default host and service templates.

 (pallet.core/defnode nagios   []   :bootstrap [(admin-user/automated-admin-user)]   :configure [;; restrict access
              ;; configure MTA
               "" :internet-site)
              ;; install nagios
              (nagios/nagios "nagiospwd")
              ;; allow access to nagios web site
              (iptables/iptables-accept-port 80)
              ;; configure notifications
              {:contactname "hugo"
               :servicenotificationperiod "24x7"
               :hostnotificationperiod "24x7"
               :email ""
               :contactgroups [:admins]})]   :restart-nagios [(service/service "nagios3"
                     :action :restart)]) 

Trying it out

That's it. To fire up both machines, we use pallet's converge command.

 (pallet.core/converge   {monitored 1 nagios 1} service   :configure :restart-nagios) 

The nagios web interface is then accessible on the nagios node with the nagiosadmin user and specified password. Real world usage would probably have several different monitored configurations, and restricted access to the nagios node.

Still to do...

Support for nagios is not complete (e.g. remote command configuration still needs to be added, and it has only been tested on Ubuntu), but I would appreciate any feedback on the general approach.

Discuss this post here.

Published: 2010-08-18