Informatics uses a Nagios monitoring system to keep track of the health and current status of many of its services and servers. One of the components of the Nagios environment is lcfg-hwmon
. This periodically performs some routine health checks on servers and services then sends the results to Nagios, which alerts administrators if necessary. lcfg-hwmon
checks several things:
- It warns if any disks are mounted read-only. The SL6 version excluded device names starting
/media/
and/dev/loop
. The SL7 version also ignores anything mounted on/sys/fs/cgroup
. This check can be disabled by giving thehwmon.readonlydisk
resource a false value. - If it finds RAID controller software it uses this to get the current status of the machine’s RAID arrays, then it reports any problems found. It knows about MegaRAID SAS, HP P410, Dell H200 and SAS 5i/R RAID types. Note that the software does not attempt to find out what sort of RAID controller the machine actually has, so the administrator has to be sure to use the correct RAID header when configuring the machine.
- It warns if any of the machine’s power supply units has failed or is indicating a problem.
As well as the periodic checks from cron
a manual status check can be done with
/usr/sbin/check_hwmon --stdout
If the --stdout
option is omitted the result is sent to Nagios rather than displayed on the shell output.
Version 0.21.2-1 of lcfg-hwmon
functions properly on SL7 servers. In Informatics, any server using dice/options/server*.h
gets lcfg-hwmon
. Other LCFG servers can get it like this:
#include <lcfg/options/hwmon.h>
In related news, the RAID controller software for the RAID types listed above is now installed on SL7 servers by the same headers as on SL6. The HP P410 RAID software has changed its name from hpacucli
to hpssacli
but seems otherwise identical. The Dell H200 software sas2ircu
has gained a few extra commands (SETOFFLINE, SETONLINE, ALTBOOTIR, ALTBOOTENC) but the existing commands seem unchanged. The other varieties of RAID software are much as they were on SL6.