Configure Nagios using Pallet

Basic Nagios support was recently added to pallet, and while very simple to use, this blog post should make it even simpler. The overall philosophy is to configure the nagios service monitoring definitions along with the service itself, rather than have monolithic nagios configuration, divorced from the configuration of the various nodes.

As an example, we can configure a machine to have it's ssh service, CPU load, number of processes and number of users monitored. Obviously, you would normally be monitoring several different types of nodes, but there is no difference as far as pallet is concerned.

We start by requiring various pallet components. These would normally be part of a ns declaration, but are provided here for ease of use at the REPL.

 (require   '[pallet.crate.automated-admin-user
    :as admin-user]   '[pallet.crate.iptables :as 'iptables]   '[pallet.crate.ssh :as ssh]   '[pallet.crate.nagios-config
     :as nagios-config]   '[pallet.crate.nagios :as nagios]   '[pallet.crate.postfix :as postfix]   '[pallet.resource.service :as service]) 

Node to be Monitored by Nagios

Now we define the node to be monitored. We set up a machine that has SSH running, and configure iptables to allow access to SSH, with a throttled connection rate (six connections/minute by default).

 (pallet.core/defnode monitored   []   :bootstrap [(admin-user/automated-admin-user)]   :configure [;; set iptables for restricted access
              (iptables/iptables-accept-icmp)
              (iptables/iptables-accept-established)
              ;; allow connections to ssh
              ;; but throttle connection requests
              (ssh/iptables-throttle)
              (ssh/iptables-accept)]) 

Monitoring of the SSH service is configured by simply adding (ssh/nagios-monitor).

Remote monitoring is implemented using nagios' nrpe plugin, which we add with (nagios-config/nrpe-client). To make nrpe accessible to the nagios server, we open the that the nrpe agent runs on using (nagios-config/nrpe-client-port), which restricts access to the nagios server node. We also add a phase, :restart-nagios, that can be used to restart the nrpe agent.

Pallet comes with some configured nrpe checks, and we add nrpe-check-load, nrpe-check-total-proces and nrpe-check-users. The final configuration looks like this:

 (pallet.core/defnode monitored   []   :bootstrap [(admin-user/automated-admin-user)]   :configure [;; set iptables for restricted access
              (iptables/iptables-accept-icmp)
              (iptables/iptables-accept-established)
              ;; allow connections to ssh
              ;; but throttle connection requests
              (ssh/iptables-throttle)
              (ssh/iptables-accept)
              ;; monitor ssh
              (ssh/nagios-monitor)
              ;; add nrpe agent, and only allow
              ;; connections from nagios server
              (nagios-config/nrpe-client)
              (nagios-config/nrpe-client-port)
              ;; add some remote checks
              (nagios-config/nrpe-check-load)
              (nagios-config/nrpe-check-total-procs)
              (nagios-config/nrpe-check-users)]   :restart-nagios [(service/service
                    "nagios-nrpe-server"
                    :action :restart)]) 

Nagios Server

We now configure the nagios server node. The nagios server is installed with (nagios/nagios "nagiospwd"), specifying the password for the nagios web interface, and add a phase, :restart-nagios, that can be used to restart nagios.

Nagios also requires a MTA for notifications, and here we install postfix. We add a contact, which we make a member of the "admins" contact group, which is notified as part of the default host and service templates.

 (pallet.core/defnode nagios   []   :bootstrap [(admin-user/automated-admin-user)]   :configure [;; restrict access
              (iptables/iptables-accept-icmp)
              (iptables/iptables-accept-established)
              (ssh/iptables-throttle)
              (ssh/iptables-accept)
              ;; configure MTA
              (postfix/postfix
               "pallet.org" :internet-site)
              ;; install nagios
              (nagios/nagios "nagiospwd")
              ;; allow access to nagios web site
              (iptables/iptables-accept-port 80)
              ;; configure notifications
              (nagios/contact
              {:contactname "hugo"
               :servicenotificationperiod "24x7"
               :hostnotificationperiod "24x7"
               :servicenotificationoptions
                  "w,u,c,r"
               :hostnotificationoptions
                  "d,r"
               :servicenotificationcommands
                 "notify-service-by-email"
               :hostnotification_commands
                  "notify-host-by-email"
               :email "my.email@my.domain"
               :contactgroups [:admins]})]   :restart-nagios [(service/service "nagios3"
                     :action :restart)]) 

Trying it out

That's it. To fire up both machines, we use pallet's converge command.

 (pallet.core/converge   {monitored 1 nagios 1} service   :configure :restart-nagios) 

The nagios web interface is then accessible on the nagios node with the nagiosadmin user and specified password. Real world usage would probably have several different monitored configurations, and restricted access to the nagios node.

Still to do...

Support for nagios is not complete (e.g. remote command configuration still needs to be added, and it has only been tested on Ubuntu), but I would appreciate any feedback on the general approach.

Discuss this post here.

Published: 2010-08-18

A Clojure library for FluidDB

FluidDB, a "cloud" based triple-store, where the objects are immutable and can be tagged by anyone, launched about a month ago. As a another step to getting up to speed with Clojure, I decided to write a client library, and clj-fluiddb was born. The code was very simple, especially as I could base the library on cl-fluiddb, a Common-Lisp library.

I have some ideas I want to try out using FluidDB. It's permission system is one of it's best features, together with the ability to use it for RDF like triples means that it could provide a usable basis for growing the semantic web. My ideas are less grandiose, but might take as long to develop, we'll see...

Discuss this post here.

Published: 2009-09-13

Product Development Flow

I have spent the last few months with my latest start-up, Artfox, where I have been trying to push home some of the lean start-up advice expounded by Eric Ries and Steve Blank. I was hoping that "The Principles of Product Development Flow", by Donald Reinertsen, might help me in making a persuasive argument for some of the more troublesome concepts around minimum viable product and ensuring that feedback loops are in place with your customers as soon as possible. Unfortunately, I don't think that this is the book if you are looking for immediate, practical prescription, but it is a thought provoking, rigorous view of the product development process, that pulls together ideas from manufacturing, telecommunications and the Marines.

Perhaps Reinertsen's most accessible advice is that decisions in product development should be based on a strong economic foundation, pulled together by a concept of the "Cost of Delay". Rather than on relying on prescriptions for each of several interconnected metrics, such as efficiency and utilisation, Reinertsen suggests that economics will provide different targets for each of these metrics depending on the costs of the project at hand.

His proposition that product development organisations should measure "Design in Process", similar to the idea of "Intellectual Working In Process" proposed by Thomas Stewart in his book "Intellectual Capital", is what allows him to make the parallels to manufacturing and queueing theory and enables the application of the wide body of work in these fields to product development.

His practical advice, such as working in small batches and using a cadence for activities that require coordination, will come as no surprise to practitioners of agile development, and Reinertsen provides clear reasoning of why these practices work.

During my time at Alcan, and later Novelis, I gave a lot of thought to scheduling, queues and cycle times in a transformation based manufacturing environment, and I found that this had many parallels to his view of the product development process, and little in common with what Reinertsen describes as manufacturing, which seems to be limited to high volume assembly type operations. I found many ideas that could be usefully taken back to a manufacturing context.

If you look at this book as an introduction to scheduling, queueing theory and the reason's behind some of agile development practices, then you will not be disappointed.

Discuss this post here.

Published: 2009-08-30

Archive