Hardware monitoring and RAID on SL7

Informatics uses a Nagios monitoring system to keep track of the health and current status of many of its services and servers. One of the components of the Nagios environment is lcfg-hwmon. This periodically performs some routine health checks on servers and services then sends the results to Nagios, which alerts administrators if necessary. lcfg-hwmon checks several things:

  • It warns if any disks are mounted read-only. The SL6 version excluded device names starting /media/ and /dev/loop. The SL7 version also ignores anything mounted on /sys/fs/cgroup. This check can be disabled by giving the hwmon.readonlydisk resource a false value.
  • If it finds RAID controller software it uses this to get the current status of the machine’s RAID arrays, then it reports any problems found. It knows about MegaRAID SAS, HP P410, Dell H200 and SAS 5i/R RAID types. Note that the software does not attempt to find out what sort of RAID controller the machine actually has, so the administrator has to be sure to use the correct RAID header when configuring the machine.
  • It warns if any of the machine’s power supply units has failed or is indicating a problem.

As well as the periodic checks from cron a manual status check can be done with

/usr/sbin/check_hwmon --stdout

If the --stdout option is omitted the result is sent to Nagios rather than displayed on the shell output.

Version 0.21.2-1 of lcfg-hwmon functions properly on SL7 servers. In Informatics, any server using dice/options/server*.h gets lcfg-hwmon. Other LCFG servers can get it like this:

#include <lcfg/options/hwmon.h>

In related news, the RAID controller software for the RAID types listed above is now installed on SL7 servers by the same headers as on SL6. The HP P410 RAID software has changed its name from hpacucli to hpssacli but seems otherwise identical. The Dell H200 software sas2ircu has gained a few extra commands (SETOFFLINE, SETONLINE, ALTBOOTIR, ALTBOOTENC) but the existing commands seem unchanged. The other varieties of RAID software are much as they were on SL6.

Published by

Chris Cooke

Chris Cooke is a Computing Officer in the School of Informatics at the University of Edinburgh. He works in the Managed Platforms Unit and rides a very large bicycle.

Leave a Reply

Your email address will not be published. Required fields are marked *