Progress so far on SL7 server base

Every year or two we migrate all of DICE to a newer operating system version, so that we can keep up with technology advances and security fixes. Most recently we’ve been moving it from Scientific Linux 6 to Scientific Linux 7.

When migrating DICE to a new platform, we make the move in several stages. First we need our configuration environment LCFG fully working on the new OS (see for instance Work involved in porting DICE to SL7); then we work on the desktop computing environment, and the research and teaching software it needs; after that comes the tools and environment for servers. We’re tackling the last of those stages now, the SL7 server platform project. We have several hundred servers, hosting both a variety of services and a range of behind-the-scenes support functions.

So far we’ve tested and passed these things:

  • Server networking features. Setting NM_CONTROLLED=no in the network interface config files allows us to use the old networking scripts to setup bonding, bridging and VLANs. We’ll take a look at doing this with Network Manager later on, since the old networking scripts will probably be removed at some point, but in the meantime we have access to the networking functionality which our servers need.
  • IPMI. We use it for our monitoring needs and for Serial Over LAN (remote consoles and remote power control).
  • Our standard SL7 disk partition layout.
  • The basic active checks for our Nagios monitoring setup.
  • We’ve installed the software needed by the Nagios passive check which monitors network bonding, and it’s now working correctly.
  • The hwmon passive check does a variety of hardware health tests. These ones have been tested and work on SL7: read-only disk mounts; MegaSAS RAID; dual power supply redundancy; LSI SAS 5i/R RAID.
  • RAID controller software and LCFG configuration headers for MegaSAS RAID and for LSI SAS 5i/R RAID.
  • The toohot overheating emergency shutdown tool.
  • Fibre Channel Multipath. The ability to use multiple paths through the FC fabric increases the dependability of our storage area network facilities.
  • LVM. This storage abstraction layer is used for storage space for the VMs on our virtualisation servers.
  • We have rethought the DNS configuration for SL7. Instead of using only localhost for DNS lookups, SL7 servers will be configured to query the full set of DNS servers.

We’re currently working on support for other RAID types, on LCFG apacheconf and on other aspects of Fibre Channel functionality.

A new MPU blog

This is the new blog of the Managed Platform Unit. The MPU is one of the organisational units of the computing staff in the School of Informatics; it’s responsible for the Linux platform which forms the basis of DICE. It also maintains the tools needed by that platform, principally LCFG. See here for links to all of the units.

We’ll use this blog to keep you up to date on work which is shared between the MPU members. Initially that’ll include our work to develop an SL7 server platform.

We’ll still be blogging individually too (Alastair’s ramblings, Stephen’s work ramblings, cc:) and of course we’ll make announcements on the Computing Systems blog.