Moving services to SL7

Although we don’t always blog about it, the MPU has been busy lately. One project which has been taking a great deal of our time is the SL7 server upgrade project, the effort to move all of the various services we run from the now-outdated version 6 of Scientific Linux to version 7. The MPU, one of the School’s five computing units, has its fair share of services to port to SL7, and here’s a summary of the work we did for this project during the last third of 2016 (that is, September to December):

We upgraded these MPU services to SL7:

  • The virtual machine hosting service. This covers eight servers on three sites, hosting some 180 guest VMs, most of which were kept running seamlessly through the upgrades.
  • The PXE service and the package cache service. These share two servers in the Forum and the Tower.
  • The PackageForge package-building service, covering two build servers and a master server. The build servers’ performance was improved by moving them to VMs. Before the master server could be upgraded, the PackageForge software needed enhancement, including an upgrade to PostgreSQL 9.6, changing the package data from YAML format in the filesystem to JSON format in the database – opening the way for a future version to provide far better presentation of the build results in the user interface – and various code updates, making the web interface noticeably more responsive.
  • The export packages server. This was moved to a new VM.
  • The LCFG slave servers – the two main slaves, one test slave, one DIY DICE slave and two inf-level release-testing slaves, an increase of one (we now monitor the inf level on SL6 and SL7). The two main slave servers were substantially speeded up by increasing their memory to 8GB, so that all LCFG profile information could be held in memory at once.
  • The site mirrors packages server, where we keep our own copies of various software repositories covering Scientific Linux, EPEL, PostgreSQL and others.
  • The LCFG website and the LCFG wiki. We installed and configured a substantially updated version of the TWiki software.
  • BuzzSaw and LogCabin (which organise and serve the login logs) were moved to the new SL7 loghost. This work included the update of Django packages and the building of some dependencies.
  • The LCFG disaster relief server, which will take over our configuration infrastructure should some calamity befall the Forum. This server hosts a complex mix of services , so sorting out its Apache config for SL7 helped to prepare the way for the LCFG master upgrade to come.

In addition, substantial work was done towards the upgrade of these services:

  • The computing help service.
  • The LCFG bug tracking service.
  • The LCFG master:
    • Replacement of Apache mod_krb5 with mod_gssapi;
    • Porting of mod_user_rewrite to the new LCFG build tools;
    • Reworking of the rfe packaging to produce a separate rfe-server sub-package and to introduce systemd support;
    • A complete rewrite of the rfe component in Perl with Template Toolkit;
    • We moved the web view of the LCFG repositories from the outdated websvn to the more capable viewvc, with a new LCFG component to manage its configuration;
    • The updating of all components’ defaults packages to up-to-date SL7 versions.

Work on this project has continued into 2017, but more of that in a future post.

LVM, SL7 and physical volume device names

The use of traditional /dev/sd[xx] disk device names has become increasingly unsafe in recent Linux installs. This is particularly so in an environment with SAN devices. For a while now, we have been mounting disk partitions by UUID, but until recently the LVM component has continued to rely on /dev/sd[xx] names.

The LVM component, on configure(), checks to see whether any additional physical volumes have been configured for a volume group. It does this by using the ‘pvs’ component to enumerate the physical volumes are associated with each volume group. The command vgextend (or vgcreate) are used to add a physical volume to a volume group. Unfortunately, these commands store away the resulting /dev/sd[xx] name of they physical volumes – and not the /dev/by-uuid name. This means that on subsequent reboots, there’s a high chance that the /dev/sd[xx] name will be wrong for a physical volume.

The solution is to generate a UUID on each physical volume (based on a hash of the physical path name), label the physical volume with that UUID (with pvcreate) and look for that UUID using ‘pvs -o pv_uuid,vg_name’ instead of looking for the /dev/sd[xx] device.

Consistent network names – virtual hardware

As described in an earlier post, we have recently enabled consistent network interface names under SL7 – for physical machines.

We have concluded that this is not practical to do for virtual hardware. The device names presented by the consistent network interface name scheme  depend on the the underlying configuration of the virtual guests. As this configuration is not under our control, the device names will be unpredictable.

SL7 – multipath and LVM

We have been working on checking support for DM multipath and LVM under SL7.2. Our first attempts at doing this under SL7.1 failed miserably as a result of unpredictable FibreChannel problems – we’ve found, in the past, that support for FibreChannel in early dot releases of new RHEL/SL major releases is flaky.

First off, we discovered that our standard LCFG SL7 platform had dmraid (software RAID) enabled as standard. This was creating a lot of excessive “noise” from the kernel at boottime as the dmraid module attempts to scan every attached block device – this was particularly noticeable on a SAN attached host with multiple routes to SAN volumes. We have disabled dmraid by default, and created a header file to pull it back in where required.

We next looked at DM multipath. Confusingly, despite the version of DM-multipath not having changed between SL6 and SL7, various parameters have changed. A template for SL7 (actually, EL7) was created, and a couple of multipath component resources added, to support the new parameters that we need to tweak. With SL6, and earlier, we added manual configuration to support our IBM disk array. There is built-in support for this array in SL7, so our manual configuration can be removed. Note, however, that we have never added configuration for the DotHill arrays – it may be that we are using inappropriate values for these (eg no_path_retry=fail rather than queue) under SL6.

Then onto LVM. Some confusing behaviour was discovered to be caused by the daemon lvmetad which is now enabled by default on SL7. For some reason, on some system boots, pvscan was returning an unknown physical volume device for a volume group – this was making the LVM component re-adding the configured physical volume to the volume group (because it couldn’t see it in the list). This in turn created a new PV UUID, and you ended up with a volume group with an increasing number of missing physical volumes. The volume group would work, however. It’s possible that running LVM in initramfs would fix this, but disabling lvmetad also fixed it. It seems that the purpose of lvmetad is to reduce the time taken at system boot to scan block devices for LVM physical volumes. We don’t have so many physical devices that an LVM  scan at boottime takes ages so running lvmetad seems unnecessary. However, we may need to revisit this in the future?

In SL6 and earlier, multipath and LVM configuration was required to be loaded into the initrd. Both the multipath and lvm components would trigger a rebuild of the initrd (via the kernel component) whenever their configuration changed. It looks like this is not necessary for SL7, even where multipath provisioned filesystems are mounted in /etc/fstab. We have done lots of testing, but it’s still possible that we’ve just been lucky with the timing. If we do need to load multipath and/or LVM configuration into the initrd, we will need to consider how best to do this. Under SL7, dracut will no longer automatically upload LVM and multipath configuration into the initrd : modifications to the kernel, lvm and multipath components may all be required.

 

LCFG apacheconf component

I’ve recently been working on updating the apacheconf component to support Apache 2.4 on SL7. I think I now have it in reasonable shape and I’m not expecting any further major changes. The new version will be in the stable release on Wednesday 17th February.

The summary of the changes is on the LCFG wiki. For reference the original list of ideas are here. I’ve not spent any time (yet) on improving the nagios support. The multi-line verbatim support should be fine once I add the ¶ feature into the server as recently discussed.

I am planning to spend some time on improving the general documentation for the component including some basic recipes.

Any comments welcome. If you feel there is anything important I have missed out then now is your last chance to make suggestions for a while!

Consistent network interface names and LCFG

As explained in an earlier post, we are moving to the more “modern” consistent network interface naming scheme because the old-style method of hard-wiring interfaces to interface names of the form eth0 no longer works with RHEL7. This is a problem for machines with multiple interfaces – eg servers.

(Note that you can stick with the legacy naming scheme by defining LCFG_NETWORK_LEGACY_NAMING at the head of a machine’s profile).

As a recap, under the consistent naming scheme, interfaces are known as :-

Device Name
On-board (embedded) interface em[1234…]
PCI card interface p<slot>p<port>
Virtual p<slot>p<port>_<virtualif>

For example, a minimally configured Dell R730 has four on-board interfaces : these would be called em1, em2, em3 and em4.

We could modify all our LCFG configuration to use the new names directly. However, there are many LCFG macros which assume that network interfaces are of form eth[n] and changing these would be somewhat disruptive. We have decided to stick with using the old form eth[n] in LCFG configuration, providing a means of associating these names with a real physical device. The macro call LCFG_NETWORK_SET_DEVICE(eth0,em1)  will associate the LCFG name eth0 with the physical device em1.  The hardware headers for each machine model will include calls to LCFG_NETWORK_SET_DEVICE for all the onboard network interfaces. This means that the process should be largely transparent on most machines.

Some machines have both embedded and PCI-E interfaces. For example, in Informatics, HP DL180s will commonly have two onboard interfaces and one PCI-E interface. We usually configure these machines such that a network bond is formed using one of the onboard interfaces (usually the second as the first is used for IPMI) and the PCI-E interface. On DICE, the following will be configured by default for these machines :-

LCFG_NETWORK_SET_DEVICE(eth0,p1p1)
LCFG_NETWORK_SET_DEVICE(eth1,em2)

Note that old form of eth[n] is only used in LCFG configuration – all operating system tools (eg netstat) and configuration will expect names of the form em[n] or pp.

We shall probably convert LCFG configuration to use the native network interface names throughout – possibly at the next major platform upgrade.

Hardware monitoring and RAID on SL7

Informatics uses a Nagios monitoring system to keep track of the health and current status of many of its services and servers. One of the components of the Nagios environment is lcfg-hwmon. This periodically performs some routine health checks on servers and services then sends the results to Nagios, which alerts administrators if necessary. lcfg-hwmon checks several things:

  • It warns if any disks are mounted read-only. The SL6 version excluded device names starting /media/ and /dev/loop. The SL7 version also ignores anything mounted on /sys/fs/cgroup. This check can be disabled by giving the hwmon.readonlydisk resource a false value.
  • If it finds RAID controller software it uses this to get the current status of the machine’s RAID arrays, then it reports any problems found. It knows about MegaRAID SAS, HP P410, Dell H200 and SAS 5i/R RAID types. Note that the software does not attempt to find out what sort of RAID controller the machine actually has, so the administrator has to be sure to use the correct RAID header when configuring the machine.
  • It warns if any of the machine’s power supply units has failed or is indicating a problem.

As well as the periodic checks from cron a manual status check can be done with

/usr/sbin/check_hwmon --stdout

If the --stdout option is omitted the result is sent to Nagios rather than displayed on the shell output.

Version 0.21.2-1 of lcfg-hwmon functions properly on SL7 servers. In Informatics, any server using dice/options/server*.h gets lcfg-hwmon. Other LCFG servers can get it like this:

#include <lcfg/options/hwmon.h>

In related news, the RAID controller software for the RAID types listed above is now installed on SL7 servers by the same headers as on SL6. The HP P410 RAID software has changed its name from hpacucli to hpssacli but seems otherwise identical. The Dell H200 software sas2ircu has gained a few extra commands (SETOFFLINE, SETONLINE, ALTBOOTIR, ALTBOOTENC) but the existing commands seem unchanged. The other varieties of RAID software are much as they were on SL6.

Network device naming

During our original project to port LCFG to SL7 we were only really considering desktops which typically have a single network interface. To get things working quickly we decided to stick with the “legacy” network device naming scheme which gives us interfaces named like eth0, eth1, eth2, etc. This works just fine with a single interface since we will only ever need access to eth0 but as we’ve moved onto adding network interface bonding for servers we have some discovered some problems. Many of our servers have two controllers each of which has two devices, for maximum reliability we wish to bond over one device from each controller. Traditionally we have done this by naming eth0 as the first device on the first controller and eth1 as the first device on the second controller. We have found with the legacy support on SL7 that this is not possible as they always come out as eth0 and eth2 (eth1 being the second device on the first controller) it seems that the ability to rename interfaces based on MAC address is not working correctly. Due to the way we have configured bonding in LCFG, for simplicity we really would like the two interfaces to continue to be named eth0 and eth1. To resolve this problem we have decided that it is now time to convert to the “modern” naming scheme as described in the Redhat network guide. The interfaces can then be aliased as eth0 and eth1 after they have been configured with their “consistent” names. This appears to work as desired but requires some changes be made in the LCFG headers and we will be working through this transition over the next few weeks. It is likely that the complete change to the default approach will have to wait until the SL7.2 upgrade to ensure we don’t break anything. The first step will be to move the “legacy” support out of the lcfg-level header (lcfg/defaults/network.h) into the ed-level header, this will not have any impact for most users but makes it possible to easily enable and disable the naming schemes for testing purposes. New headers have been provided – lcfg/options/network-legacy-names.h and lcfg/options/network-modern-names.h – to make it easy to swap between the two naming schemes. Once we are confident that this modern approach is reliable we will update the various hardware support headers in the lcfg-level so that it works for the various server models we have in Informatics.

LCFG apacheconf component

As discussed at the LCFG Annual Review meeting held in December we are planning to start work on updating the apacheconf component for apache 2.4 fairly soon. We will also be generally refactoring the whole thing. There is a wiki page which holds a collection of ideas for new features that would be nice to have and bugs that should be fixed. I’m currently doing some exploratory work to decide how to approach this week so this is the last chance to make suggestions. Please either add them to the wiki page (tag them with your name please) or email them to me directly.