LCFG Profile Security Project

March 21, 2018

After improving support for Apache authentication in the LCFG server I have moved onto the client this week. The bulk of the work has been focused on the creation of a new LCFG::Client::Fetcher module which encapsulates all the details associated with fetching XML profiles from various sources. As well as improving the authentication support I am taking the chance to overhaul a chunk of code which has not seen much love in either of the v3 or v4 projects. One particular issue is that currently the handling of the list of profile sources is spread around the client libraries, this means that even a small change can involve locating and altering many separate small pieces of code. This general work also includes adding support for IPv6, enhancing SSL security as well as making the code much more maintainable.

One big change in approach I’ve made is that the lists of local file and remote web server sources are now handled in a unified way where previously they were dealt with completely separately. The new Fetcher module has a single list of source objects (either LCFG::Client::Fetch::Source::File or LCFG::Client::Fetch::Source::Remote) which come from the value of the client.url resource. One advantage here is that it is now trivial to add an entirely new type of source (e.g. rsync or ldap) anything with an LWP::Protocol module is a possibility. When configured to use both local files and remote sources the client has always preferred local files where possible, this behaviour is retained by using a priority system with file sources being guaranteed to have a higher default priority than any remote source.

The other part of recent development work is the addition of support for different authentication mechanisms. This is supported via modules in the LCFG::Client::Fetch::Auth namespace, currently we have modules for basic (username/password) and gssapi authentication. As with the new source modules this approach means it is easy to support alternative mechanisms, including site-specific needs which might not be appropriate for merging into the upstream code base. Before making a request the Fetcher will call the relevant authentication module to initialise the environment. I am also working on supporting multiple mechanisms so that if one fails the next will be tried until one succeeds.

Most of the code for the client is now in place and I am working on documentation for the various new modules. Once that is done I need to consider how the necessary authentication information can make it from LCFG resources into the rdxprof application via the LCFG client component. Although I would rather not make such a big change it might be that I finally need to bite the bullet and rewrite the client component from bash into Perl.

LCFG Profile Security Project

March 13, 2018

I have recently begun work on the Review Security of LCFG Profile Access project. So far I have mostly been considering the various aspects of the project with the aim being to produce a list of ideas which can be discussed at some future Development Meeting.

The first aspect of the project I have looked at in more depth is the LCFG server which has support for generating Apache .htaccess files. These can be used to limit access to each individual LCFG profile when fetched over http/https. We have traditionally supported both http and https protocols and relied on IP addresses to limit access but would like to move over to https-only along with using GSSAPI authentication, the LCFG client would then use a keytab to get the necessary credentials. To help with this change I have introduced a new schema (4) for the profile component and made some modifications to the LCFG server code which makes it easier to use the Apache mod_auth_gssapi module. In particular there is new auth_tmpl_$ resource which allows the selection of a different template (e.g. the template which is provided in the package) which more closely meets local requirements. There are also auth_vars_$ and auth_val_$_$ resources which can be used to specify any additional information that is required. For example:

!profile.version_profile mSET(4) /* not yet the default */
!profile.auth          mADD(ssl)
!profile.auth_tmpl_ssl mSET(
!profile.acl_ssl       mADD(@admin)
!profile.auth_vars_ssl mADD(groupfile)
!profile.auth_val_ssl_groupfile mSET(/etc/httpd/conf.d/

which results in the the LCFG server generating the following .htaccess file:

AuthName ""
GssapiBasicAuth Off
GssapiBasicAuthMech krb5
GssapiSSLonly On
GssapiCredStore "keytab:/etc/httpd.keytab"
AuthGroupFile "/etc/httpd/conf.d/"
  Require user "host/"
  Require group "admin"

The profile.acl_ssl resource holds a list of users and groups (which have an ‘@’ prefix). In a real deployment it might make more sense to use an lcfg/ principal rather host/. The groupfile support is provided by the mod_authz_groupfile module which needs to be loaded.

I have tested this with curl and it works as required. The LCFG client doesn’t currently have support for doing a kinit (or launching something like k5start in the background) prior to fetching the profile so it isn’t yet possible to actively use this authentication method.

Remote Desktop Project

February 28, 2018

This week I’ve been preparing the new staff XRDP service for user testing. It now has a quovadis SSL certificate and I’ve been attempting to resolve an issue with some clients presenting a warning dialogue about not trusting the certificate. According to this bug report it is necessary to include the whole trust chain in the certificate file. I’ve tried appending the contents of the .chain file without success, it’s not clear if I am missing a part of the chain, I’ll continue investigating but if we can’t easily resolve the issue we could just document what users should expect to see.

As Chris had access to a Windows machine he has managed to generate a .bmp image file for the login screen logo which actually displays correctly. I have no idea why the various Linux applications generated bad images but I’m not going to worry too much. This gives us a much more official-looking Informatics login screen which should reassure users. The image has been packaged up in an xrdp-logo-inf RPM.

I’ve also been investigating rate-limiting new connections using iptables. The standard dice iptables configuration is rather complicated so I need to speak to George about the best way to go about this.

To ensure the xrdp service only gets started once the machine is ready to handle connections I’ve modified the systemd config so that it waits for the LCFG stable target to be reached.

I’ve noticed that all the xrdp logs are being sent to the serial console. Even with just a single user that’s flooding our console logs so I’d like to get that stopped. It’s already going to local file and syslog so no more logging is really required. SEE don’t see the same problem so I wonder if it’s related to our Informatics syslog configuration.

The user documentation is now close to being complete, we even have some information on how to access the XRDP service from Android devices.

Remote Desktop Project

February 21, 2018

This week I’ve been working on the configuration for an XRDP server for Informatics staff. This will be publicised as a prototype service, the plan being to hold off replacing the NX service until Semester 2 is completed at the end of May, that avoids the potential for any disruption to teaching. The prototype service will be installed on some spare hardware which has 2 x 2.6GHz CPU, 36GB RAM and 146GB disk space, that’s not huge but should be sufficient for multiple users to be logged in simultaneously. As the staff service is likely to only ever be based on a single server I’ve decided to simplify the config by dropping the haproxy frontend, that will now only be used on the multi-host general service. To protect from DoS attacks iptables will be used to do rate-limiting. If I can work out how to get the xrdp software to log the IP address for failed logins I will also investigate using fail2ban to add firewall rules. Most of the user documentation on is now ready, I just need to add some instructions and screenshots for the Remmina client on Linux.

User management improvements

November 23, 2017

Management of local users and groups (i.e. those in /etc/passwd and /etc/group) is done using the LCFG auth component. One feature that has always been lacking is the ability to create a home directory where necessary and populate it from a skeleton directory (typically this is /etc/skel). The result of this feature being missing is that it is necessary to add a whole bunch of additional file component resources to create the home directory and that still doesn’t provide support for a skeleton directory.

Recently I needed something along those lines so I’ve taken the chance to add a couple of new resources – create_home_$ and skel_dir_$. When the create_home resource is set to true for a user the home directory will be created by the component and the permissions set appropriately. By default the directory will be populated from /etc/skel but it could be anything. This means it is now possible to setup a machine with a set of identically initialised local users.

For example:

auth.pw_name_cephadmin           cephadmin
auth.pw_uid_cephadmin            755
auth.pw_gid_cephadmin            755
auth.pw_gecos_cephadmin          Ceph Admin User
auth.pw_dir_cephadmin            /var/lib/cephadmin
auth.pw_shell_cephadmin          /bin/bash
auth.create_home_cephadmin       yes /* Ensure home directory exists */

auth.gr_name_cephadmin           cephadmin
auth.gr_gid_cephadmin            755

LCFG Core: resource types

November 21, 2017

The recent round of LCFG client testing using real LCFG profiles from both Informatics and the wider community has shown that the code is now in very good shape and we’re close to being able to deploy to a larger group of machines. One issue that this testing has uncovered is related to how the type of a resource is specified in a schema. A type in the LCFG world really just controls what regular expression is used to validate the resource value. Various type annotations can be used (e.g. %integer, %boolean or %string) to limit the permitted values, if there is no annotation it is assumed to be a tag list and this has clearly caught out a few component authors. For example:

@foo %integer

@bar %boolean


@quux sub1_$ sub2_$

Both of the last two examples (baz and quux) are tag lists, the first just does not have any associated sub-resources.

The compiler should not allow anything but valid tag names (which match /^[a-zA-Z0-9_]+$/) in a tag list resource but due to some inadequacies it currently permits pretty much anything. The new core code is a lot stricter and thus the v4 client will refuse to accept a profile if it contains invalid tag lists. Bugs have been filed against a few components (bug#1016 and bug#1017). It’s very satisfying to see the new code helping us improve the quality of our configurations.

yum cache and disk space

November 15, 2017

At a recent LCFG Deployers meeting we discussed a problem with yum not fully cleaning the cache directory even when the yum clean all command is used. This turns out to be related to how the cache directory path is defined in /etc/yum.conf as /var/cache/yum/$basearch/$releasever. As the release version changes with each minor platform release (e.g. 7.3, 7.4) the old directories can become abandoned. At first this might seem like a trivial problem but these cache directories can be huge, we have seen instances where gigabytes of disk space have been used and cannot be simply reclaimed. To help fix this problem I’ve added a new purgecache method to the LCFG yum component. This takes a sledgehammer approach of just deleting everything in the /var/cache/yum/ directory. This can be run manually whenever required or called regularly using something like cron. In Informatics it is now configured to run weekly on a Sunday like this:

!cron.objects             mADD(yum_purge)
cron.object_yum_purge     yum
cron.method_yum_purge     purgecache
cron.run_yum_purge        AUTOMINS AUTOHOUR * * sun

LCFG autoreboot

November 10, 2017

One of the tools which saves us an enormous amount of effort is our LCFG autoreboot component. This watches for reboot requests from other LCFG components and then schedules the reboot for the required date/time.

One nice feature is that it can automatically choose a reboot time from within a specified range. This means that when many similarly configured machines schedule a reboot they don’t all go at the same time which could result in the overloading of services that are accessed at boot time. Recently it was reported that the component has problems parsing single-digit times which results in the reboot not being scheduled. Amazingly this bug has lain undetected for approximately 4 years during which time a significant chunk of machines have presumably been failing to reboot on time. As well as resolving that bug I also took the chance to fix a minor issue related to a misunderstanding of the shutdown command options which resulted in the default delay time being set for 3600 minutes instead of 3600 seconds, thankfully we change that delay locally so it never had any direct impact on our machines.

Whilst fixing those two bugs I discovered another issue related to sending reboot notifications via email, if that failed for any reason the reboot would not be scheduled, the component will now report the error but continue. This is a common problem we see in LCFG components where problems are handled with the Fail method (which logs and then exits) instead of just logging with Error. This is particularly a problem since an exit with non-zero code is not the same as dieing which can be caught with the use of the eval function. Since a call to Fail ends the current process immediately this can lead to a particularly annoying situation where a failure in a Configure method results in a failure in the Start method. This means that a component might never reach the started state, a situation from which it is difficult to recover. We are slowly working our way through eradicating this issue from core components but it’s going to take a while.

Recently we have had feedback from some of our users that the reboot notification message was not especially informative. The issue is related to us incorporating the message into the message of the day which sometimes leads to it being left lieing around out-of-date for some time. The message would typically say something like “A reboot has been scheduled for 2am on Thursday”, which is fine as long as the message goes away once the reboot has been completed. To resolve this I took advantage of a feature I added some years ago which passes the reboot time as a Perl DateTime object (named shutdown_dt) into the message template. With a little bit of thought I came up with the following which uses the Template Toolkit Date plugin:

[%- USE date -%]
[%- USE wrap -%]
[%- FILTER head = wrap(70, ‘*** ‘, ‘*** ‘) -%]
This machine ([% host.VALUE %]) requires a reboot as important updates are available.
[%- END %]

[% IF enforcing.VALUE -%]
[%- FILTER body = wrap(70, ‘ ‘, ‘ ‘) -%]
It will be unavailable for approximately 15 minutes beginning at
[% date.format( time = shutdown_dt.VALUE.epoch,
format = ‘%H:%M %A %e %B %Y’,
locale = ‘en_GB’) %].
Connected users will be warned [% shutdown_delay.VALUE %] minutes beforehand.
[%- END %]

[% END -%]

This also uses the wrap plugin to ensure that the lines are neatly arranged and the header section has a “*** ” prefix for each line to help grab the attention of the users.

LCFG Core: Resource import and export

November 7, 2017

As part of porting the LCFG client to the new core libraries the qxprof and sxprof utilities have been updated. This has led to the development of a new high-level LCFG::Client::Resources Perl library which can be used to import, merge and export resources in all the various required forms. The intention is that eventually all code which uses the LCFG::Resources Perl library (in particular the LCFG::Component framework) will be updated to use this new library. The new library provides a very similar set of functionality and will appear familiar but I’ve taken the opportunity to improve some of the more awkward parts. Here’s a simple example taken from the perldoc:

# Load client resources from DB
my $res1 = LCFG::Client::Resources::LoadProfile("mynode","client");

# Import client resources from environment variables
my $res2 = LCFG::Client::Resources::Import("client");

# Merge two sets of resources
my $res3 = LCFG::Client::Resources::Merge( $res1, $res2 );

# Save the result as a status file
LCFG::Client::Resources::SaveState( "client", $res3 );

The library can import resources from: Berkeley DB, status files, override files, shell environment and explicit resource specification strings. It can export resources as status files, in a form that can be evaluated in the shell environment and also in various terse and verbose forms (e.g. the output styles for qxprof).

The LCFG::Resources library provides access to resources via a reference to a hash which is structured something like:

   'sysinfo' => {
                 'os_id_full' => {
                                  'DERIVE' => '/var/lcfg/conf/server/releases/develop/core/include/lcfg/defaults/sysinfo.h:42',
                                  'VALUE' => 'sl74',
                                  'TYPE' => undef,
                                  'CONTEXT' => undef
                 'path_lcfgconf' => {
                                  'DERIVE' => '/var/lcfg/conf/server/releases/develop/core/include/lcfg/defaults/sysinfo.h:100',
                                  'VALUE' => '/var/lcfg/conf',
                                  'TYPE' => undef,
                                  'CONTEXT' => undef

The top level key is the component name, the second level is the resource name and the third level is the name of the resource attribute (e.g. VALUE or TYPE ).

The new LCFG::Client::Resources library takes a similar approach with the top level key being the component name but the value for that key is a reference to a LCFG::Profile::Component object. Resource objects can then be accessed by using the find_resource method which returns a reference to a LCFG::Resource object. For example:

my $res = LCFG::Client::Resources::LoadProfile("mynode","sysinfo");

my $sysinfo = $res->{sysinfo};

my $os_id_full = $sysinfo->find_resource('os_id_full');

say $os_id_full->value;

Users of the qxprof and sxprof utilities should not notice any differences but hopefully the changes will be appreciated by those developing new code.

Testing the new LCFG core : Part 2

May 18, 2017

Following on from the basic tests for the new XML parser the next step is to check if the new core libs can be used to correctly store the profile state into a Berkeley DB file. This process is particularly interesting because it involves evaluating any context information and selecting the correct resource values based on the contexts. Effectively the XML profile represents all possible configuration states whereas only a single state is stored in the DB.

The aim was to compare the contents of the old and new DBs for each Informatics LCFG profile. Firstly I used rdxprof to generate DB files using the current libs:

cd /disk/scratch/profiles/
for i in $(find -maxdepth 1 -type d -printf '%f\n' | grep -v '^\.');\
do \
 echo $i; \
 /usr/sbin/rdxprof  -v -u file:///disk/scratch/profiles/ $i; \

This creates a DB file for each profile in the /var/lcfg/conf/profile/dbm directory. For 1500-ish profiles this takes a long time…

The next step is to do the same with the new libs:

find /disk/scratch/profiles/ -name '*.xml' | xargs \
perl -MLCFG::Profile -wE \
'for (@ARGV) { eval { $p = LCFG::Profile->new_from_xml($_); \
$n = $p->nodename; \
$p->to_bdb( "/disk/scratch/results/dbm/$n.DB2.db" ) }; \
print $@ if $@ }'

This creates a DB file for each profile in the /disk/scratch/results/dbm directory. This is much faster than using rdxprof.

The final step was to compare each DB. This was done simply using the perl DB_File module to tie each DB to a hash and then comparing the keys and values. Pleasingly this has shown that the new code is generating identical DBs for all the Informatics profiles.

Now I need to hack this together into a test script which other sites can use to similarly verify the code on their sets of profiles.

Testing the new LCFG core : Part 1

May 17, 2017

The project to rework the core LCFG code is rattling along and has reached the point where some full scale testing is needed. The first step is to check whether the new XML parser can actually just parse all of our LCFG profiles. At this stage I’m not interested in whether it can do anything useful with the data once loaded, I just want to see how it handles a large number of different profiles.

Firstly a source of XML profiles is needed, I grabbed a complete local copy from our lcfg server:

rsync -av -e ssh lcfg:/var/lcfg/conf/server/web/profiles/ /disk/scratch/profiles/

I then ran the XML parser on every profile I could find:

find /disk/scratch/profiles/ -name ‘*.xml’ | xargs \
perl -MLCFG::Profile -wE \
‘for (@ARGV) { eval { LCFG::Profile->new_from_xml($_) }; print $@ if $@ }’

Initially I hit upon bug#971 which is a genuine bug in the schema for the gridengine component. As noted previously, this was found because the new libraries are much stricter about what is considered to be valid data. With that bug resolved I can now parse all 1525 LCFG XML profiles for Informatics.

LCFG Core Project

May 2, 2017

Over the last few years I have been working on (and off) creating a new set of “core” libraries for LCFG. This is now finally edging towards the point of completion with most of the remaining work being related to polishing, testing and documentation.

This project originated from the need to remove dependencies on obsolete Perl XML libraries. The other main aims were to create a new OO API for resources/components and packages which would provide new opportunities for code reuse between client, ngeneric and server.

Over time several other aims have been added:

  • Simplify platform upgrades.
  • Platform independence / portability.
  • Make it possible to support new languages.
  • Ensure resource usage remains low.

Originally this was to be a rewrite just in Perl but the heavy resource usage of early prototypes showed it was necessary to move at least some of the functionality into C libraries. Since that point the chance to enhance portability was also identified and included in the aims for the project. As well as making it possible to target other platforms (other Linux or Unix, e.g. MacOSX), the enhanced portability should make it much simpler and quicker to port to new Redhat based platforms.

The intention is that the new core libraries will be totally platform-independent and portable, for example, no hardwired paths or assumptions that platform is Redhat/RPM (or even Linux) based. The new core is split two parts: C and Perl libraries with the aim that as much functionality as possible is in the C libraries to aid reuse from other languages (e.g. Python).

The aim is that these libraries should be able to co-exist alongside current libraries to ease the transition.

I have spent a lot of time on documenting the entire C API. The documentation is formatted into html and pdf using doxygen, I had not used this tool before but I am very pleased with the results and will definitely be using it more in the future. Although a slow task, documenting the functions has proved to be a very useful review process. It has helped me find many inconsistencies between functions with similar purposes and has led to numerous small improvements.

LCFG Client

The client has been reworked to use new Core libraries. This is where the platform-specific knowledge of paths, package manager, etc, is held.

Resource Support

Format Read Write
Status YES YES
Environment YES YES

There is currently no support for reading header files or source profiles but this could be added later.

There is new support for finding the “diffs” between resources, components and profiles.

Package Support

Format Read Write
rpmcfg YES YES
rpmlist YES YES

There is currently no support for reading package list files but this could be added later.

Remaining Work

There is still work to be done on the top-level profile handling code and the code for finding the differences between resources, components and profiles needs reworking. Also the libraries for reading/writing XML files and Berkeley DB need documentation.

That is all the remaining work required on the “core” libraries. After that there will be some work to do on finishing the port of the client to the new libraries. I’ve had that working before but function APIs have changed, I don’t expect it to require a huge amount of work.

PostgreSQL 9.6

September 29, 2016

I’m currently working on upgrading both the PkgForge build farm and the BuzzSaw log file processor services to SL7.2. Both of these services use PostgreSQL databases and have been stuck on 9.2 for a while pending the server upgrades. The latest version of PostgreSQL (9.6) is due to be released today so I thought I would give the release candidate a whirl to see how I get on. There are numerous benefits over 9.2, in particular I am planning to use the new jsonb column type to store PkgForge build information which was previously serialised to a YAML file, being able to query that data directly from the DB should be very useful. The feature I am most interested in trying from 9.6 is parallel execution of sequential scans, joins and aggregates. This has the potential to make some of the large queries for the BuzzSaw DB much faster. My very simplistic first tests suggest that setting the max_parallel_workers_per_gather option to 4 will reduce the query time by at least 50%, it will need a bit more investigation and analyse to check it really is helpful but that’s an encouraging result.

A 2ndQuadrant blog post has some useful information on the new parallel sequential scan feature.

LCFG Client: Hasn’t died yet…

August 2, 2016

Coming back from holiday I was pleased to see that I have a v4 client instance which has now been running continuously for nearly 3 weeks without crashing. It hasn’t done a massive amount in that time but it has correctly applied some updates to both resources and packages.

In the time I’ve not been on holiday I’ve been working hard on documenting the code. For the C code I’ve chosen to use doxygen, it does a nice job of summarizing all the functions in each library and it makes it very simple to write the documentation using simple markup right next to the code for each function. I’ve also been working through some of the Perl modules and adding POD where necessary. It might soon be at the stage where others can pick it up and use it without needing to consult me for the details…

LCFG Client: It lives!

July 15, 2016

Cue forked lightning and crashes of thunder…

After much effort I finally have the first functional installation of the v4 LCFG client. This sees all the XML parsing and profile handling moved over to the new LCFG::Profile Perl modules which are wrappers around the new lcfg-core suite of libraries. There is still a bit of work required to properly handle LCFG contexts but otherwise it can handle everything we need. There are probably lots of small bugs to be resolved, there is also an almost total lack of documentation and the tests needs lot of attention but hey, at least it runs!

LCFG Profile – Secure mode

May 19, 2016

The LCFG client has a, slightly weird, feature called “secure mode“. This makes the client hold off applying any resource changes until they have been manually reviewed. The manual checking is done by examining the contents of a “hold file” which shows the differences in values for each modified resource in a simple text form. The file also contains a “signature” which is the MD5 digest (in hex) of the changes. A change set is applied manually by passing that signature to the client which then regenerates the hold file and compares that signature with the one supplied. This is not a heavily used feature of the client but it is something we want to support in the new LCFG profile framework. The new framework has built-in support for diffing the data structures which represent LCFG profiles, components and resources. This makes it relatively straightforward to add a feature which generates the secure-mode hold file when required, the only awkward part was finding some code to do the MD5 digest in a nice way.

Here’s an example using the C API, error checking and suchlike has been dropped to keep it simple.

#include <lcfg/profile.h>
#include <lcfg/bdb.h>
#include <lcfg/differences.h>

int main(void) {

char * msg = NULL;

LCFGProfile * p1 = NULL;
lcfgprofile_from_status_dir( “/run/lcfg/status”,
&p1, NULL, &msg );

LCFGProfile * p2 = NULL;
lcfgprofile_from_bdb( “/var/lcfg/conf/profile/dbm/”,
&p2, NULL, 0, &msg );

LCFGDiffProfile * diff = NULL;
lcfgprofile_diff( p1, p2, &diff, &msg );

char * signature = NULL;
lcfgdiffprofile_to_holdfile( diff, “/tmp/holdfile”, &signature, &msg );



return 0;

LCFG profile querying

May 13, 2016

The new LCFG profile framework makes it simple to retrieve component and resource information from profiles stored in the various standard formats (XML, Berkeley DB and status files).

Loading a profile from XML, DB or status directory:

my $p = LCFG::Profile->new_from_xml(“example.xml”);

my $p = LCFG::Profile->new_from_bdb(“example.db”);

my $p = LCFG::Profile->new_from_status_dir(“/run/lcfg/status”);

Loading a component from a DB or status file:

my $c = LCFG::Profile::Component->new_from_bdb( “example.bdb”, “client” );

my $c = LCFG::Profile::Component->new_from_statusfile( “/run/lcfg/status/client” );

Retrieving a component (e.g. client) from the profile:

my $c = $p->find_component(“client”);

Retrieving a resource (e.g. client.components) from a component:

my $r = $c->find_resource(“components”);

Getting the resource value:

say $r->value;

For convenience, if the resource is a tag list then you can get the value as a perl list:

@comps = $r->value;
for my $comp (@comps) {


LCFG profile handling

May 13, 2016

Over the last few months the new libraries for handling LCFG profiles have been shaping up nicely. They are finally reaching a point where they match up with my original aims so I thought I’d give a taste of how it all works. Here’s an example of processing an LCFG XML profile into the Berkeley DB and rpmcfg files required by the client:

use LCFG::Profile;

my $xml    = '/var/lcfg/conf/profile/xml/';
my $dbm    = '/tmp/';
my $dbm_ns = 'example';
my $rpmcfg = '/tmp/';

my $new_profile = LCFG::Profile->new_from_xml($xml);

my $update_dbm = 0;
if ( -f $dbm ) {
    my $cur_profile = LCFG::Profile->new_from_bdb($dbm);

    my $diff = $cur_profile->diff($new_profile);

    if ( $diff->size > 0 ) {
        $update_dbm = 1;
} else {
    $update_dbm = 1;

if ( $update_dbm ) {
    $new_profile->to_bdb( $dbm, $dbm_ns );
    say 'Updated DBM';

my $pkgs_changed = $new_profile->to_rpmcfg($rpmcfg);
if ( $pkgs_changed ) {
    say 'Updated packages';

This is basically what the LCFG client does whenever it processes a new profile but is a lot nicer than the current rdxprof code!

Platform-specific config

January 13, 2016

I recently came across this blog article titled Stop writing code that will break on Python 4!. Although the title mentions python 4 it is really discussing “any future major version“.

This is something we have learnt to deal with in LCFG over the years. We often have to tweak configuration slightly when developing support for new platforms and this results in lots of if/elseif/else statements based on the target platform. Once you’ve been through the platform upgrade cycle a few times you learn that the most efficient approach is to special-case the old platform and make the new platform the default. By assuming that the configuration required for the new platform will be the default going forwards (i.e. it sits in the “else” branch) you make the configuration for N+1 and also handle N+M at the same time.

Writing modern C

January 10, 2016

For the v4 LCFG client project I’ve been writing lots of C. To get my C knowledge up to scratch I’ve been consulting many books of varying vintages which leads to quite a mixture of coding styles. I’m quite keen to create code which is concerned “good” according to modern coding standards but I also want to ensure it will compile on a wide range of Unix-like systems, that seems to mean right now that the standard to aim for is C99. I recently came across an interesting article titled "How to C (as of 2016)" which gives a good summary of many important topics to consider. I’ve already been following many of the suggestions but there are also quite a few which are totally new to me. I’m not sure I agree with all of them (e.g. not using char) but I shall definitely be applying some of them.

Security awareness

November 20, 2015

I recently came across a series of short blog posts from the SANS Securing the Human site on the topic of Security Awareness. I found them to be quite interesting and thought provoking. If you’re interested in what can be done to improve the security of an organization I’d recommend these as a good starting point:

  1. The 4 Ws to Awareness Success
  2. The Why in Effective Awareness Programs
  3. The Who in Effective Awareness Training
  4. The What in Effective Awareness Training
  5. The How in Effective Awareness Training

LCFG XML Profile changes

August 20, 2015

As part of the LCFG v4 client project I am working on converting the XML profile parsing over to using the libxml2 library. Recent testing has revealed a number of shortcomings in the way the LCFG XML profiles are generated which break parsers which are stricter than the old W3C code upon which the current client is based. In particular the encoding of entities has always been done in a style which is more suitable for HTML than XML. There is really only a small set of characters that must be encoded for XML, those are: single-quote, double-quote, left-angle-bracket, right-angle-bracket and ampersand (in some contexts the set can be even smaller). The new XML parser was barfing on unknown named entities which would be supported by a typical web browser. It is possible to educate an XML parser about these entities but it’s not really necessary. A better solution is to emit XML which is utf-8 compliant which avoids the needs for additional encoding. Alongside this problem of encoding more than was necessary the server was not encoding significant whitespace, e.g. newlines, carriage returns and tabs. By default a standards compliant XML parser will ignore such whitespace. An LCFG resource might well contain such whitespace so it was necessary to add encoding support to the server. In the process of making these changes to the LCFG::Server::Profile::XML module I merged all the calls to the encoder into a call to a single new EncodeData subroutine so that it is now trivial to tweak the encoding as required. These changes will be going out in version 3.3.0 of the LCFG-Compiler package in the next stable release. As always, please let us know if these changes break anything.

MooX::HandlesVia and roles

August 20, 2015

I’ve been using Moo for Perl object-oriented programming for a while now. It’s really quite nice, it certainly does everything I need and it’s much lighter than Moose.

Whilst working on the LCFG v4 client project I recently came across a problem with the MooX::HandlesVia module when used in conjunction with roles. I thought it worth blogging about if only to save some other pour soul from a lot of head scratching (probably me in 6 months time).

If a class is composed of more than one role and each role uses the MooX::HandlesVia module, for example:

    package SJQ::Role::Foo;
    use Moo::Role;
    use MooX::HandlesVia;

    package SJQ::Role::Bar;
    use Moo::Role;
    use MooX::HandlesVia;

    package SJQ::Baz;
    use Moo;

    with 'SJQ::Role::Foo','SJQ::Role::Bar';

    use namespace::clean;

my $test = SJQ::Baz->new();

It fails and the following error message is generated:

Due to a method name conflict between roles 'SJQ::Role::Bar and
SJQ::Role::Foo', the method 'has' must be implemented by 'SJQ::Baz'
at /usr/share/perl5/vendor_perl/Role/ line 215.

It appears that MooX::HandlesVia provides its own replacement has method and this causes a problem when namespace::clean is also used.

The solution is to apply the roles separately, it’s perfectly allowable to call the with method several times. For example:

    package SJQ::Baz;
    use Moo;

    with 'SJQ::Role::Foo';
    with 'SJQ::Role::Bar';

    use namespace::clean;

PostgreSQL new features

June 10, 2015

It looks like PostgreSQL 9.4 has some really interesting new features. Today I came across a blog post by 2ndquadrant demonstrating the WITHIN GROUP and FILTER clauses. I don’t think I’ve entirely got my head round the purpose of WITHIN GROUP yet, I suspect I need a couple of good real-world examples. The FILTER clause looks very handy though, I’m sure I’ll be using that when I get the chance.

LCFG::Component environment plugins

January 5, 2015

Version 1.13.0 the Perl version of the ngeneric framework (LCFG::Component) provides an all-new environment initialisation system for component methods. This has support for plugins which mean it is fully extensible.

There is a new InitializeEnvironment method which is called for most standard methods which are accessible via om (including configure, start, restart, stop, run, and logrotate). The method can also be called from any additional methods you have added to your own components, the method needs access to the resources so it must be called after a call to LoadProfile or LoadStatus.

There are currently two plugins – a very simple one which can be used to set values for environment variables before the method is called and a more complex one that can do the equivalent of kinit and aklog to acquire Kerberos credentials and AFS tokens.

For full details see the LCFG wiki.

Moo and Type::Tiny

December 14, 2014

At the start of 2014 I was working on a project to further improve the LCFG client. When I hit problems with Moose and its memory usage I discovered the excellent Moo framework which provides all the nice bits but is much less heavyweight. As part of the Perl Advent Calendar for 2014 someone has written a great introductory article on using Moo along with Type::Tiny. I’ve learnt a few things, I particularly like the idea of a “type library” as a good way to organize all the local types.

LCFG::Build::Skeleton changes

December 8, 2014

At the LCFG Annual Review meeting held last week one topic which was discussed was the idea of all Perl based LCFG components being implemented as modules with the component script just being a very thin wrapper which loads the module and calls the dispatch method. This has been our recommended coding style for quite a while now and we use this approach for many of the core components.

During the discussion I realised that the lcfg-skeleton tool which is used to create the outline directory structure for new projects does not support this way of working. To make it really easy to create new Perl-based components which follow recommended best-practice I have consequently updated LCFG-Build-Skeleton. The new version 0.4.1 creates a module file (e.g. lib/LCFG/Component/, the necessary CMake file and also tweaks the specfile appropriately. This will be in the stable release on Thursday 18th December or you can grab it from CPAN now.

LCFG authorization

December 3, 2014

The authorization of LCFG component methods (which are called using the om command) is typically done using the LCFG::Authorize module. This is limited to checking usernames and membership of groups managed in LCFG.

In Informatics we have for a long-time used a different module – DICE::Authorize – which extends this to also checking membership of a netgroup. Recently we discovered some problems with our implementation of this functionality which make it very inflexible. We have been connecting directly to the LDAP server and doing the lookup based on hardcoded information in the module. As this really just boils down to checking membership of a netgroup this can clearly be done more simply by calling the innetgr function. This will work via the standard NS framework so will handle LDAP, NIS or whatever is required. The necessary details are then only stored in the standard location and not embedded into the code.

Rather than just rewrite the DICE::Authorize module I took the chance to move the functionality to the LCFG layer, so we now have LCFG::Authorize::NetGroups. This nicely sub-classes the standard module so that if the user is not a member of a netgroup the other checks are then carried out. This is much better code reuse, previously we had two distinct implementations of the basic checks.

Having a new implementation of the authorization module is also handy for dealing with the transition stage. We can keep the old one around so that if a problem is discovered with the new code we can quickly switch back to the old code.

I also took the chance to improve the documentation of the authorization framework so if you’re still running om as root now is a good time to improve things!

Sub-classing LCFG components

December 3, 2014

One topic that often comes up in discussions about how to make things easier for LCFG component authors is the idea of sub-classing.

Although I had never tried it I had always assumed this was possible. Recently whilst looking through the LCFG::Component code I noticed that the list of methods are looked up in the symbol table for the module:

    my $mtable = {};
    for my $method ( ( keys %LCFG::Component:: ),
        ( eval 'keys %' . ref($self) . q{::} ) )
        if ( $method =~ m/^method_(.*)/i ) {
            $mtable->{ lc $1 } = $method;
    $self->{_METHOD} = lc $_METHOD;
    my $_FUNCTION = $mtable->{ $self->{_METHOD} };

So, this will work if your method comes from LCFG::Component or LCFG::Component::Foo but it wouldn’t work if you have a sub-class of Foo. You would potentially miss out on methods which are only in Foo (or have to copy/paste them into your sub-class).

Not only does this make sub-classing tricky it also involves a horrid string eval. There had to be a better way. Thankfully I was already aware of the Class::Inspector module which can do the necessary. This module is widely used by projects such as DBIx::Class and Catalyst so is likely to be reliable. It has a handy methods method which does what we need:

    my $_FUNCTION;
    my $public_methods = Class::Inspector->methods( ref($self), 'public' );
    for my $method (@{$public_methods}) {
        if ( $method =~ m/^Method_(\Q$_METHOD\E)$/i ) {
            $_FUNCTION = $method;
            $_METHOD = $1;

Much nicer code and a tad more efficient. Now the LCFG component Perl modules are properly sub-classable.

Usenix LISA 2014

November 18, 2014

Last week I attended the Usenix LISA conference in Seattle. There was a very strong “DevOps” theme to this year’s conference with a particular focus on configuration management, monitoring (the cool term seems to be “metrics”) and managing large scale infrastructure. As always, this conference offers a strong hallway track, there is the opportunity to pick the brains of some of the best sysadmins in the business. I had a lot of interesting discussions with people who work in other universities as well as those who work at the very largest scale such as Google.

There were lots of good talks this year, annoyingly quite a few of those which seemed likely to be most interesting had been scheduled against each other. Thankfully most of them were recorded so they can be viewed later. There is no doubt that this conference delivers real value for money in terms of the knowledge and inspiration gained. I had conversations with several people where we commented that the cost of the entire conference, including travel and accommodation, equals just a few days of “professional training” in the UK. A few of the highlights for me were:

Radical Ideas from the Practice of Cloud Computing

This talk by Tom Limoncelli’s was based on some of the topics in his new book – The Practice of Cloud System Administration: Volume 2: Designing and Operating Large Distributed Systems. He proposed the idea that it is better to use lots of cheaper, less reliable, hardware rather than a few very expensive machines. He explained how this can be achieved by focussing on resilience of a service rather than reliability of individual hardware, this becomes cheaper as a portion of the total capital expenditure as you scale up.

He moved on to showing that when you have a risky business process you should not avoid it but rather should choose to do it more frequently, a “practice makes perfect” approach. With practice your procedures will become better understood and they will be more reliable and more efficient. Admins are unlikely to have good knowledge of a process which is only done rarely. Doing risky processes often also helps reveal single points of failure in your infrastructure.

An advantage of doing updates regularly is that the changes can be applied in small batches. The changes are thus easier to debug because they are recent and fresh in the minds of developers. Also, the environment changes less so it’s easier to spot the origin of a problem if one occurs. The frequent application of changes also keeps developers happy, they get faster feedback and have the warm, fuzzy feeling of success on a regular basis. This idea of keeping the feedback loop short and tight was something that kept cropping up throughout the conference and it’s clear to me that this is one of the main factors in the success of the DevOps strategy.

Clearly doing risky changes frequently does mean that bad things will happen. Tom recommended avoiding punishing people for outages, any problem should be seen as a failure of the procedures, one quote was “there is no root cause, only contributing factors“. The best way to handle outages is to be well prepared, this means anticipating likely problems, having practice drills and ensuring there is a thorough post-mortem. A post-mortem should consider what went right/wrong and propose actions which can be done in the short and long-term. This is something we have been doing in Informatics for several years, it’s always nice to be told you’re doing the right thing!

His closing remarks were “We run services not servers” and “We are hired to be awesome in the face of failure“. Clearly he is working at a different scale to what we do in Informatics but these sentiments are still both very applicable to how we manage our systems in Informatics.

I’m definitely interested in getting a copy of his book to learn more. Impressively, many people at the conference queued up to get Tom to sign their copy.

Building a One-Time Password Token Authentication Infrastructure

This was an excellent talk which covered a subject we have been investigating in Informatics. This talk was given by two admins from the LIGO project. They had identified user credentials theft as a critical risk to their project. The data generated by the project is eventually published publically so they are not worried about data theft, rather they are concerned about loss of access to scientific data which is not replayable. If their systems are down when an important astronomical event occurs they will lose valuable data. They were particularly focussing on avoiding problems which can occur because users reuse passwords on multiple services.

Their plan was to use a separate credential that is not replayable, this is important, they didn’t just want a second authentication factor. This credential would be used to gain access to the most critical parts of their infrastructure. As well as increasing security this has an important psychological benefit in that it makes users aware whenever they are accessing the most important systems. For services such as email they would not be required to use a second factor, the inconvenience would annoy users too much for the small benefit gained. They noted that it is still necessary to beware that either end of an active session could be hijacked after authentication has been successfully completed.

They examined various options, they required a token-based – “something you have” – approach, preferably it should be highly tamper resistant. They wanted a separate physical device to avoid the opportunity for remote compromise, as could occur with software based systems in mobile devices. They gave an example of a virus which infected MacOSX computers and then deliberately targetted iPhones when they were plugged into the machine. I hadn’t really considered this downside of using mobile devices before, it definitely makes me strongly in favour of a solely hardware token approach.

They did note some limitations of token-based systems. In particular they only have a limited lifetime which seems to be in the range of 2 to 3 years, depending on usage. This created some problems for the project, how do you securely deliver a token to a very remote user? Particularly if they have lost one and need a replacement quickly. Many tokens are time-based, this can introduce synchronisation problems for remote users who cannot return to base to get it fixed. Also, many time-based systems avoid replays by only allowing one login within a time window (e.g. 1 minute), this could be frustrating for users.

They went on to discuss how any 2-factor system is going to introduce additional overheads. There will be issues with failures occurring at any point in the system. It needs to integrate well with existing infrastructure and preferably avoid the need to replace software.

They did not wish to trust 3rd parties or rely on a proprietary blackbox solution that could be compromised and lose secrets. To achieve total ownership of the system they created their own custom authentication server. This supports a multi-site approach with secure replication of data. They selected the yubikey device which we have looked at in Informatics. This is used via PAM as a second factor to Kerberos authentication.

This talk gave a very good coverage of the whole 2-factor authentication problem. I look forwards to reviewing the recording and the slides. I will have to find out if we can get the code for their custom authentication system and try it out in Informatics.

One Year After the Meltdown: Now What?

This talk was given by Mikey Dickerson who was originally seconded from Google to the White House to help fix the website when it so spectacularly failed to deal with demand last year. Due to the very imminent deadline for the website to be ready for renewals he had to do the talk via video link from the White House. This worked much better than I feared it would and thankfully the network didn’t collapse. The main thing I took from this was how a DevOps approach can be applied to failing projects no matter how huge and weighed down with bureaucracy. There was a clear determination to save the project without resorting to a complete rewrite, the success came from restructuring teams and using better procedures. It was interesting to hear that they had been in contact with the GOV.UK people and considered the UK government to have better public facing IT services. They are now moving onto applying the same strategy to other US government IT services, in particular the Veterans Association. The team are clearly very determined and driven, they are working stupid numbers of hours each week. Many of them have given up well paid private sector jobs so that they can make a real difference to the country. It will be interesting to see if they manage to achieve real permanent change which can cope with a change of president.

Gauges, Counters and Ratios, Oh My!

The aim of this talk was to explain how to design useful metrics which can be used for service monitoring and problem diagnosis. It started off with quite a technical discussion of the definition of “metric”. The definition given was “a named value at some specific time“. Having discussed these 3 important points (name, value and time) the discussion moved onto using high-dimension databases which can handle high-resolution time series data. The recommended Open-Source software for this purpose is OpenTSDB which works on hadoop.

There was also discussion about why gathering metrics is useful. In particular 4 broad themes were identified: operational health monitoring, quality assurance, capacity planning and product management. Currently we do health monitoring fairly well but we’re not really doing the others. I think it would certainly be very useful to have better monitoring of resources when planning for future capacity requirements.

The recommended software suite to cover all requirements is nagios (or equivalent) plus Graphite plus Sensu plus logstash plus ganglia.

Although an interesting talk I think I would have benefited more from the talk the speaker gave at LISA 2013 titled “A Working Theory of Monitoring” which he referenced a couple of times. The slides and video of that previous talk are now available online.

The Top 5 Things I Learned While Building Anomaly Detection Algorithms for IT Ops

This talk was given by Toufic Boubez who is clearly a smart chap who really knows his stuff. He gave lots of useful advice on how to analyse the metrics you have collected to detect anomalies.

His main point was that your data is almost certainly NOT gaussian. This is a problem because most analytic tools assume that parametric techniques are applicable.

There is also the issue that “yesterday’s anomaly is today’s normal“. He talked about how stationarity (sic) is not a realistic assumption with large complex systems. The term for this is “Concept Drift“.

He went on to discuss non-parametric techniques (such as the Kolmogorov-Smirnov (KS) test) which can be used to compare probability distributions.

As well as using the right statistical techniques it is very important to have good domain knowledge. You need to know your data and the general patterns. This will allow you to customise alerts appropriately so you don’t get paged unnecessarily.

He also noted that some data channels are inherently very quiet. It’s hard to deal with this type of data using time-series techniques. Sparse data is very hard to analyse but will still contain very important information.

The speaker posts interesting stuff on his twitter account.

LCFG Client Guide

March 31, 2014

As part of my work on updating the LCFG client I’ve written a guide to the inner workings of the LCFG client. This is intended to be fairly high-level so it doesn’t go into the details of which subroutine calls which subroutine. The aim is that this should cover all the main functionality and provide the information necessary to get started with altering and extending the client code base.

Sys Admins need to be extra careful

March 21, 2014

Recently there have been revelations that the NSA is explicitly targetting sys admins. This is because they see sys admins as a good way to gain access to the users and data on the networks they manage. It’s worried me for a while now that gaining access to a typical sys admin account provides an attacker with a really easy way to get root access (for instance, there are plenty of sites out there which allow anyone in group “wheel” to gain extra privileges). Also, as I blogged recently, even when you cannot directly gain full root access, anyone who is permitted to do privileged admin tasks using sudo probably has some sort of illicit way of gaining extra privilege.

Even if we ignore concerns about government surveillance, when you can trivially find a huge list of sys admins via you know that attackers are going to be focussing their efforts on that list of targets. It’s clear to me that we have reached a time where sys admins are going to have to accept more onerous access restrictions than a “normal” user because they have the ability to easily acquire a lot more power than a “normal” user. We’re going to be obliged to use technologies such as multi-factor authentication, we’re going to have to avoid insecure web sites that require accounts but don’t have an https option, we’re going to have to use a secure VPN just to do simple things.

sudo security issues

March 17, 2014

I’ve always been very wary of using sudo for anything more than the simplest cases. I quite like the Ubuntu approach of using sudo to gain root privileges instead of su, it’s nice and simple and doesn’t give any suggestion of power being restricted, all it really achieves is the avoidance of the root password. A complicated sudo configuration has always seemed like a great way to hand out complete root privilege whilst being under the false impression that everything is nice and secure. This recent blog article I spotted has confirmed in my mind that heavy reliance on sudo really is a recipe for disaster.

LCFG Client Refactor: New profile parser

January 24, 2014

Recently I’ve been working on developing a new framework which encapsulates all aspects of handling the LCFG profiles on the client-side. This framework is written in Perl and is named, appropriately enough, LCFG::Profile, I plan to blog about the various details in due course. The coding phase is almost complete and I’ve moved onto adding documentation for all the module APIs. I’ve found the documentation phase to be a very useful review process. It has helped me spot various dark corners of the code and methods which were added earlier in the development process which are no longer required. Removing this dead code now is a good idea as we may otherwise end up being committed to supporting the code if it forms part of a public API. I’ve also found it to be a very good way to spot inconsistencies between similar APIs implemented in the various modules. It’s definitely a good idea to follow the principle of least surprise whenever possible. If methods are named similarly and take a similar group of arguments they probably ought to return similar types of results.

LCFG Client Refactor: Phase Two

December 5, 2013

As the results of Phase One of the LCFG Client refactoring project are now in the beta-testing stage and approaching a roll-out date we have commenced work on Phase Two. The primary aim of this new work is to remove all dependencies on the W3C::SAX Perl modules which have been unmaintained for a very long time. We’re probably the last place in the world still using those modules so it’s definitely time to be moving on to something more modern. The project plan for this new work is available for anyone interested.

As a first step I’ve been prototyping some new XML parsing code based on the popular and well-maintained XML::LibXML module. I’ve also been thinking about ideas for an API for storing/accessing the information regarding components and resources. I’ve put together some useful notes on the LCFG XML profile structure to help me get my head around it all.

Security: Using the human perimeter

December 5, 2013

I recently came across an interesting security blog article on the Dark Reading site – "Using The Human Perimeter To Detect Outside Attacks". This is particularly interesting because, as part of our ongoing efforts to improve the security of our network, earlier this year I developed a new "log cabin" service which allows users to review all their SSH and web authentications. As well as providing a web interface where you can peruse all your login activity for the last few months we also send out terse monthly summaries to everyone by email. These summaries list only the most "interesting" connection sources and help to encourage users to keep checking. I will be speaking about this project at the next FLOSS UK conference which will be held in Brighton in March 2014. The talk is titled "Crowd-Sourcing the Detection of Compromised User Accounts" and it will look at how users can become involved in the whole process of keeping their account secure. I particularly like the term "human perimeter" I might have to borrow that one.

LCFG V3 Client – beta release

October 17, 2013

I am pleased to announce that the v3 update for the LCFG client has now reached the beta-release stage. As of stable release 2013101401a everything is in place to begin testing at your ownsite. Full details are available on the LCFG wiki.

If you come across any bugs or unexpected behaviour please file a bug at

LCFG Annual Review Meeting

October 16, 2013

On Thursday 5th December instead of our normal Deployers Meeting we are going to be holding an Annual Review meeting.

All users of LCFG are encouraged to attend this meeting to hear about what has been happening over the last year and what developments they can look forwards to in the next year. This is also an excellent opportunity to raise issues that are important to you, put forward ideas for future developments you would like to see and chat about all things LCFG!

This will start at 2pm and we aim to be finished by 5pm. It will be held in room 2.33 of the Informatics Forum (note that this is NOT the usual room).

Full details are available on the LCFG wiki.

Afterwards there will be an informal gathering in a local pub followed by some of us going somewhere for food.

I hope to see lots of people there!

LCFG Client Refactor: Further node name support

June 3, 2013

I remember once as a 12 year old playing rugby at school. I received the ball, saw the field ahead was clear and knew that this was the time to run like hell. For one joyous moment I was brushing aside the defending team, spotting my moment of glory, having never been a particularly sporty kid was this my chance to join the cool crowd? Sadly, someone burst my bubble and pointed out that the main reason I wasn’t being flattened was because we were actually playing touch rugby…

Anyway, my general point is, it’s always good to know when, having been passed the ball, you should just run like hell and see what happens. It might also be good to remember which game you are playing but, hey, ho…

Having been given the chance to split the LCFG node name from the host name, I spotted a chance to really make it count. In short order the following code has been altered to extend this support to the whole of the LCFG client framework:

  • perl-LCFG-Utils 1.5.0
  • lcfg-ngeneric 1.4.0
  • lcfg-om 0.8.0
  • lcfg-file 1.2.0
  • lcfg-authorize 1.1.0
  • lcfg-hackparts 0.103.0
  • lcfg-logserver 1.4.0
  • lcfg-sysinfo 1.3.0
  • lcfg-installroot 0.103.0

None of this has (yet) been shipped to the stable tree since it needs more hacking of the current LCFG client (v2) code to fix a compatibility issue.

The big achievement here is that it makes it possible to specify the lcfg nodename on the PXE installer kernel command-line via the lcfg.node parameter and get the whole way through to an installed managed machine which is using a LCFG profile which is completely unrelated to the host name.

There are various big benefits to this change. It is now possible to have a fully roaming machine which is LCFG managed, there is no requirement for a static host name or static IP address. This means that no matter what host name or domain name settings are in place the LCFG client will continue to work as required. This also makes it possible to use a single “generic” profile to configure multiple machines. If you know you have a lab full of identical machines this could be very handy indeed.

The downside of this is that some things like spanning maps will not work the way you might expect. You also will not receive notifications from the server when a profile changes, you have to really solely on the poll time (probably worth making the timeout shorter). You probably also cannot send acknowledgements to the server and the LCFG status pages will consequently be mostly useless for those clients. It is also difficult to configure networking to do anything other than use DHCP. You’re choosing to move some of the configuration information back out of LCFG (or at least out of a particular profile). You may end up saving effort one way and adding it in another.

At the moment although I have broken the conceptual link between node and host name for the client framework there are still lots of components which are confused by this change. Components have traditionally been able to rely on combining the profile.node and profile.domain resources to form the FQDN. This was probably always on slightly shaky ground but now there can be no guarantee whatsoever of a useful value in the profile.node resource. If a component really cares about the host name (rather than the node name) then it will have to ask the host directly (using hostname or Sys::Hostname from Perl).

LCFG Client Refactor: host name versus node name

May 23, 2013

A long-standing issue that we have had with the LCFG client is that it is not possible to use an LCFG profile with a name which does not match the host name. They have always been treated by rdxprof and the ngeneric framework as conceptually interchangeable. There is no particular reason for this limitation other than the traditional “it’s always been that way“, also we’ve never had a requirement important enough to get this implemented or the opportunity to quickly make the change. As the refactoring project is drawing to a close it seemed like a good time to break this conceptual connection and rework the code to always use the LCFG node name. For the moment the actual behaviour won’t change, since the node name defaults to the host name as before, but we now have a mechanism to allow it to be altered. When the client enters daemon mode it now stashes the name of the LCFG node being used. Since you can only run one client daemon at a time this makes reasonable sense. The standalone one-shot behaviour remains unaltered, you can still access any profile you like.

python and string encodings

May 21, 2013

I’ve recently finished the User accessible login reports project. After the initial roll-out to users I had a few reports of people getting server errors when certain sets of data were viewed. This website is written in Python and uses the Django framework. During the template processing stage we were getting error messages like the following:

DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 30: invalid continuation byte.

It appears that not all data coming from the whois service is encoded in the same way (see RFC 3912 for a discussion of the issue). In this case it was using a latin1 encoding but whois is quite an old service which has no support for declaring the content encoding used so we can never know what we are going to have to handle in advance.

A bit of searching around revealed the chardet module which can be used to automatically detect the encoding used in a string. So, I just added the following code and the problem was solved.

import chardet
enc = chardet.detect(val)['encoding']
if enc != 'utf-8':
    val = val.decode(enc)
val = val.encode('ascii','replace')

The final result is that I am guaranteed to have the string from whois as an ascii string with any unsupported characters replaced by a question mark (?). It’s not a perfect representation but it is web safe and is good enough for my needs.

LCFG Client Refactoring: starting the daemon

May 21, 2013

Following on from my previous work on fixing the way in which the UDP socket is opened for receiving notification messages I have been looking at why the LCFG component just hangs when the rdxprof process fails to daemonise.

It turns out that the LCFG client component uses an obscure ngeneric feature of the Start function which is that the final step is to call a StartWait function if it has been defined. In the client component this StartWait function sits waiting forever for a client context change even when the rdxprof process failed to start…

I think the problem comes from an expectation that the call to the Daemon function, which starts and backgrounds the rdxprof process, will fail if rdxprof fails to run. It does not fail ($? is zero) and the PID of the rdxprof process is always accessible through the $! variable, even if it was very short-lived.

There is, thankfully, a very simple solution here. The client component already has a IsProcessRunning function which can be used to check if the process associated with a PID is still active. This has to be used carefully, I have put a short sleep after the daemonisation stage to ensure that the process is fully started before doing the check. The check is also fairly naive so there is the slight risk that if the system is under resource pressure the rdxprof process could fail and then the PID could be immediately reused. For now I think it’s reasonable to just accept the risks attached and revisit the issue later if it causes us problems. Associated with this, clearly the StartWait function really ought to eventually give up.

LCFG Client Refactor: notification handling

May 21, 2013

One long-standing issue with running the LCFG client (rdxprof) in daemon mode has been that if another process has already acquired the UDP socket which it wants (port 732) then it does not fail at startup but just hangs. This is clearly rather undesirable behaviour as it leaves the machine in an unmanageable state but because the client process appears to be running it’s difficult to notice that anything is actually wrong.

Yesterday I spent a while looking at this problem. I reduced it to the most simple case of a script with a while-loop listening for messages on a UDP socket and then printing the messages to the screen. Running multiple processes at the same time revealed that there was nothing preventing multiple binds on the socket. I eventually discovered that this is caused by the SO_REUSEADDR option being set on the socket.

When using TCP setting this option is often necessary. It allows a process to reacquire access to a socket when it restarts. Sometimes processes may restart too quickly and the socket would otherwise not be ready. For UDP this option is only necessary if you want to listen on a broadcast or multicast address and have multiple listeners on the same machine, that’s a fairly unusual scenario.

Disabling the SO_REUSEADDR option does exactly what we want. Attempting to run two rdxprof processes now results in it exiting with status 1 and this message:

[FAIL] client: can't bind UDP socket
[FAIL] client: Address already in use

There is a further problem with the LCFG client component not returning control to the caller when it fails to start rdxprof and I will have to do some further investigations into that problem.

LCFG Client Refactor: splitting the project

May 14, 2013

I’ve recently been working on splitting the LCFG client code base from the LCFG component which is used to configure and manage the daemon. This allows the client Perl modules to be built in the style of a standard Perl module. The immediate benefit of this is the enhanced portability, it makes it much easier to build the code on platforms other than DICE Linux if you can use standard Perl module building tools. We could also upload the code to CPAN which would make it even easier to download and install.

There are also benefits for maintainability, the standard Perl build tools make it easy to run unit tests and do other tasks such as checking test and API documentation coverage for your code. It is not impossible to do these things without some tool like Module::Build but it is a lot more awkward. Also, without the standard tools you have to know, or be able to discover, where certain files should be installed, we have some of this built into the LCFG build tools CMake framework but it only handles fairly simple scenarios.

The new project which contains all the Perl modules for the client is named LCFG-Client-Perl in subversion and the component continues to be named lcfg-client in the standard LCFG component naming style. This completes stage 9 of the project plan.

LCFG Client Refactor: Comparing files

May 14, 2013

One thing that we need to do very frequently in the LCFG client, and also in many LCFG components, is comparing files. Typically we want to see if the new file we have just generated is any different from the previous version, in which case we will need to replace old with new and possibly carry out some action afterwards.

There are clearly many ways to solve this problem. We could read in the two files and do a simple string comparison (conceptually simple but tends to be messy, particularly if you want to minimise the memory requirements). It is also possible to calculate checksums for a file (MD5, SHA1, etc). I quite like this approach and it is nice and fast for small files. Up until now I’ve been using a mix of methods based on Text::Diff (wastes time since I don’t actually what the differences are) or calculating check sums, neither of which is an ideal approach in most cases.

What I really want though is a standard API which can simply answer the question of “are these two files the same?”. Some of the older LCFG code shows its shell heritage by using the cmp command. This command does exactly what we want and does it in a fairly efficient manner. The downside is that we have to execute another process every time we want to compare two files.

Step forwards, File::Compare. I’m not sure why I hadn’t spotted this module in the past. It works in a very similar way to the good old cmp command. It is also part of the set of core Perl modules which means it is available everywhere and it has a nice simple interface. I think I shall be converting various modules over to this approach in the future.

LCFG Client Refactor: New om interface

May 13, 2013

The LCFG client uses the om command line tool to call the configure method for an LCFG component when the resources change. Up until now this has been done using backticks which is not the best approach, particularly given that this involves building a command string and launching a full shell. I’ve now added a new Perl module to help with running om commands from perl. It’s in version 0.7.1 of lcfg-om, you can use it like this:

use LCFG::Om::Command;

my ( $status, $stdout, $stderr ) =
    LCFG::Om::Command::Run( "updaterpms", "run", "-Dv", "-t" );

The parameters are: component, method, ngeneric args, component args. You only need to specify the component and method names, the other two are optional. The argument options can either be simple strings or references to lists.

The status will be true/false to show the success of the command. You also get any output to stdout and stderr separately.

If you’re concerned that some method might not complete in a reasonable amount of time you can specify a timeout for the command:

my ( $status, $stdout, $stderr ) =
    LCFG::Om::Command::Run( "openafs", "restart", "-Dv", "", $timeout );

If the timeout is reached then the Run command dies, you need to use eval or a module like Try::Tiny to catch that exception.

Nicely this will also close file-descriptors 11 and 12 which are used internally by the LCFG ngeneric framework for logging. This will avoid daemons becoming asociated with those files when they are restarted (and consequently
tieing up the rdxprof process).

This is one of those nice situations where fixing a problem for one project has additional benefits for others. The trick here was in realising that the code should be added to the lcfg-om project rather than it just being in the LCFG client code base.

LCFG Client Refactor: File and Directory paths

May 9, 2013

The way in which the LCFG client handles paths to files and directories has never been pleasant. The old code contains a lot of hardwired paths inserted at package build-time by CMake using autoconf-style macros (e.g. @FOO@). This makes the code very inflexible, in particular, there is no way for a user to run the rdxprof script as a non-root user unless they are given write access to all directories and files in the /var/lcfg/conf/profile tree. There is no good reason to prevent running of rdxprof as a normal user, if they are authorized to access the XML profile then they should be allowed to run the script which parses the file and generates the DBM and RPM configuration files. They may not be able to run in full daemon mode and control the various components but one-shot mode certainly should be functional.

There are a couple of other things added into the mix which complicate matters further. Especially, there is some support for altering the root prefix for the file-system. This is used during install time where we are running from a file-system based in / as normal but installing to a file-system based in /root. I say some support since it seems that only certain essential code paths were modified.

I needed to come up with a universal solution for these two problems which could provide a fairly straightforward interface for locating files and directories. It had to neatly encapsulate the handling of any root prefix and allow non-root users to be able to store files. To this end I’ve introduced a new module, named LCFG::Client::FileLocator, which provides a class from which a locator object can be instantiated. There are instance attributes for the root prefix and the configuration data directory path (confdir) which can be set using rdxprof command line options. This object can be used to look up the correct path for any file which the LCFG client requires. There are basic methods for finding various standard LCFG paths and also useful higher-level methods for finding files for specific hosts or particular components. It’s got comprehensive documentation too so hopefully it will be a lot easier to understand in 10 years time than the previous code.

I’ve now completed stage 8 but will have to go back and finish stage 7 "Improve option handling", I would still like to try to add in configuration file handling. It’s a lot easier now that I’ve worked out the best way to deal with the various file paths. Having a single option for altering the configuration data directory was particularly useful.

So far I reckon I’ve spent just under 13 days of effort on the project. The allocation up to this point was 11 days (I have done the bulk of stage 7 though which takes it up to 12 days allocated). So, it’s still drifting away from the target a bit but not substantially.

LCFG Client Refactor: The joy of tests

May 9, 2013

I’m currently working on stage 8 of my project plan – "Eradicate hard wired paths". I’ll blog about the gory details later but for now I just wanted to show how the small number of tests I already have in place have proved to be very useful. As part of this work I have introduced a new module – LCFG::Client::FileLocator – which is nearly all new code. Having created this module I started porting over the rest of the client code to using it for file and directory path lookups. As I already had some tests I was able to gauge my progress by regularly running the test suite. As well as showing up the chunks of old client code which still needed to be ported it revealed bugs in 8 separate lines of code in the new FileLocator code. Finding these bugs didn’t require me to write a whole new set of tests for the new code (although that is on my todo list to ensure better coverage). For me that really shows the true value of writing some tests at the beginning of a refactoring process. It definitely produced higher quality code and the porting took much less time than it would have otherwise done.

LCFG Client Refactor: Logging

May 8, 2013

The next stage of untangling the LCFG client code was to improve the logging system. Up till now it has just been done using a set of subroutines which are all declared in the LCFG::Client module. Using the logging code in any other module then requires the loading of that module, this accounts for the bulk of all the inter-dependencies between the main LCFG::Client module and all the others. With a single purpose the logging code is an obvious target for separation into a distinct sub-system.

With the logging code I felt that the best approach was to convert it into an object-oriented style. The typical way that logging is done in various Perl logging modules (e.g. something like Log::Log4perl) is to have a singleton logging object which can be accessed anywhere in the code base. The advantage of this is that it is not necessary to pass around the logging object to every subroutine where it might be needed but we can still avoid creating a new object every time it is required. If the code base was fully object-oriented we might be better served having it as an instance attribute (this is what MooseX::Log::Log4perl provides) but we don’t have that option here. The logging object can be configured once and then used wherever necessary. For simplicity of porting, for now, I have made it a global variable in each Perl module, that’s not ideal but it’s a pragmatic decision to help with the speed of porting from the old procedural approach.

The new LCFG::Client::Log module does not have a new method. To make it clear that we are not creating a new object every time it instead has a GetLogger method. If no object has previously been instantiated then one is created, otherwise the previous object is returned. Again this can be done easily using the new state feature in Perl 5.10, like this:

sub GetLogger {
    my ($class) = @_;

    use feature 'state';

    state $self = bless {
        daemon_mode => 0,
        verbose     => 0,
        abort       => 0,
        debug_flags => {%debug_defaults},
        warn_flags  => {%warn_defaults},
    }, $class;

    return $self;

This new OO-style API neatly encapsulates all the logging behaviour we require. Previously a few variables in the LCFG::Client module had to be made universally accessible so that they could be queried. The new module provides accessor methods instead to completely hide the internals. This all helps to make it possible to simply extend or switch to a more standard framework at some point in the future if we so desire.

LCFG Client Refactor: context handling

May 3, 2013

Now that the basic tidying is complete the code is in a much better condition for safely making larger scale changes. One of the particular issues that need to be tackled is the coupling between modules. The current code is a tangled web with modules calling subroutines from each other. This makes the code harder to understand and much more fragile than we would like. There has been some attempt to separate functionality (e.g. Build, Daemon, Fetch) but it hasn’t entirely worked. For instance, in some cases subroutines are declared in one module but only used in another. The two main areas I wanted to concentrate on improving are context handling and logging.

Various client Perl modules have an interest in handling LCFG contexts and there is also a standalone script (setctx) used for setting context values. (For a good description of what contexts are and how they can be used see section 5.2.5 of the LCFG Guide). The context code was spread across rdxprof and LCFG::Client but in a previous round of tidying it was all merged into LCFG::Client. Ideally it should be kept in a separate module, this allows the creation of a standard API which improves code isolation by hiding implementation details. This in turn provides much greater flexibility for the code maintainer who can more easily make changes when desired.

The easiest part of this work was the shuffling of the context code into a new module (named LCFG::Client::Contexts). Once this was done I then took the chance to split down the code into lots of smaller chunks and remove duplication of functionality wherever possible (the code has gone from 4 big subroutines to 24 much smaller ones). This has resulted in a rather big increase in the amount of code (884 insertions versus 514 deletions) which is normally seen as a bad thing when refactoring but I felt in this case it was genuinely justified. Each chunk is now easier to understand, test and document – we now have a low-level API as well as the previous high-level functions. Also most of the subroutines are now short enough to view in a single screen of your favourite editor, that hugely helps the maintainer.

An immediate benefit of this refactoring work was seen when I came to look at the setctx script. There had been a substantial amount of duplication of code between this and the rdxprof script. As the context code was previously embedded in another script it was effectively totally inaccessible – the mere act of moving it into a separate module made it reusable. Breaking down the high-level subroutines into smaller chunks also made it much easier to call the code from setctx and remove further duplication. Overall setctx has dropped from 213 lines of code to 140 (including whitespace in both cases). Functionality which is implemented in scripts is very hard to unit test compared with that stored in separate modules. So it’s now much easier to work with the context code and know that setctx won’t suddenly break.

LCFG Client Refactor: Initial tidying done

April 25, 2013

A quick dash through the code in the LCFG::Client::Fetch module (which is relatively small and fairly straightforward) means that all the LCFG client code has been checked using perlcritic and improved where necessary/possible. This completes the work for stages 4 and 5 of the project plan.

Some of the work involved in this stage has been rather more complex than anticipated. Mostly that was not related directly to resolving issues highlighted by perlcritic. In the main it was because whilst investigating the issues raised I spotted other, big problems with sections of code that I felt needed to be resolved. Those could have been kept separate and done as an additional stage in the project plan but I thought it was better to just do them. In particular, I have made large improvements to the sending and receiving of UDP messages for notifications and acknowledgements. I’ve also improved the logic involved with handling the “secure mode” and split out lots of sections of code into smaller, more easily testable, chunks.

This takes the effort expended up to about 7 days. That’s about 1 day over what I had expected but that was accounted for in stages 1 and 2, the gap between predicted and actual effort requirements has not worsened.

LCFG Client Refactor: Storing state

April 24, 2013

Having spent a while looking at the LCFG client code now it is clear that much of it would benefit from being totally restructured as a set of Object-Oriented classes (probably Moose-based). Making such a big change is beyond the scope of this project but there is still a need to store state in a sensible fashion. Currently the code has a heavy dependence on global variables which are scoped at the module level. In many ways the modules are being used like singleton objects and most of the globals are not accessible from outside of the parent module so it’s not as bad as it could be. The biggest issue with these globals is initialisation, where multiple subroutines need to use a global they all dependent on one of them having initialised the variable first. We are thus in a situation where the order in which the subroutines are called is important. This is bad news for anyone wanting to be able to fully understand the code, it also makes it impossible to test each subroutine in an isolated fashion (i.e. given this input, do I get the right output).

With the move to SL6 we got an upgrade to perl to 5.10, this is still utterly ancient but it does provide a few new handy features. The one I’ve begun using a fair bit is the state function which is used similarly to my. The difference is that these variables will never be reinitialized when a scope is re-entered (whereas my would reinitialize the value every time). This makes it possible to write subroutines which act a bit like Object-Oriented accessors with the values being set to a sensible default value where necessary. I’ve used this to nicely handle the acknowledgement and notification port global variables. Here’s an example:

use feature 'state';

sub AckPort {
    my ($value) = @_;

    # Default: Either from /etc/services or hardwired backup value
    state $ack_port = getservbyname( 'lcfgack', 'udp' ) 
                         // $DEFAULT_PORT_ACK;

    # Allow user to override
    if ( defined $value ) {
        $ack_port = $value;

    return $ack_port;

Note that the state feature needs to be specifically enabled to use this approach. On the first call to the AckPort function the $ack_port variable will be initialised. If the getservbyname function returns an undefined value (i.e. the named service was not found) then the default value will be used. If the caller specifies a value then that will override the port number. On subsequent calls the initialisation is not done and the current value will be returned. This provides a public API for getting and setting the port number with simple handling of the default value. There is no issue of needing to know in what sequence of subroutines this method will be called, all functionality is neatly encapsulated. The method is also easily testable. Overall an Object-Oriented approach would be better but this is a good halfway house.

LCFG Client Refactor: Sending acks

April 24, 2013

Part of the functionality in the LCFG::Client::Daemon code is to send acknowledgement messages to the LCFG servers whenever a new profile has been applied. The ack is sent via UDP using the SendAck method. The original code to do this took the traditional C-style approach:

  return ::TransientError("can't open UDP socket",$!)
    unless (socket(ACKSOCK,PF_INET,SOCK_DGRAM,$proto));

  return ::TransientError("can't bind UDP socket",$!)
    unless (bind(ACKSOCK,sockaddr_in(0,INADDR_ANY)));
  my $addr = inet_aton($name);
  return ::DataError("can't determine host address: $name") unless ($addr);
  my $res = send(ACKSOCK,$msg,0,sockaddr_in($aport,$addr));
  return ::TransientError("can't send notification: $name",$!)
    unless ($res == length($msg));

with a smattering of weirdness and unless thrown in for good measure. Things have moved on a bit since the days when this was the recommended approach. There is now a very handy suite of modules in the IO::Socket namespace which can handle the dirty work for us. The replacement code looks like this:

   my $port = AckPort();

    my $socket = IO::Socket::INET->new (
        PeerAddr   => $server,
        PeerPort   => $port,
        Proto      => 'udp',
    ) or return LCFG::Client::TransientError(
             "can't connect to $server:$port", $! );

    my $res = $socket->send($message);


    if ( $res != length $message ) {
        return LCFG::Client::TransientError(
                  "can't send notification: $server", $! );

That is, without a doubt, much easier to read and maintain. We are now relying on someone else to do the socket handling but that’s fine as this is a core Perl module which should be totally reliable.

LCFG Client Refactor: Daemon state tables

April 17, 2013

Having finished the tidying of the LCFG::Client::Build module I have now moved onto LCFG::Client::Daemon. The first thing which caught my eye was the handling of the state tables. These state tables are used to control how the daemon handles notifications from the server, timeouts and signals from the LCFG client component. I pretty much totally rewrote the MakeTable function so that it processed the input text and built the data structures for the tables in a much cleaner and more comprehensible manner. As with previous changes, my first step was to write some tests which checked the old function then ran them again with the new code to ensure I had not altered the API. I also introduced a new NextStateInTable function which contained code which was previously duplicated inside NextState. Finally I introduced an InitStateTables function which is called from within ServerLoop which hides the initialisation of the global variables used to hold the state tables. This means we now have a much cleaner API for handling all the state transitions based around smaller, testable functions.

LCFG Client Refactor: tidying LCFG::Client::Build

April 17, 2013

The LCFG::Client::Build module is the largest part of the LCFG client code. It weighs in at 1800 lines which is nearly 50% of all the code in the project. It contains a lot of functionality related to processing the data from the XML profile into the format stored in the local DB file and triggering components to reconfigure as necessary. Improving this code was always going to be a big task but at least once this module is done the remainder will seem easy.

The main changes which stand out are, like with LCFG::Client, related to noticing repeated coding of the same functionality. The first larger change came from noticing that in many places the value of an attribute (for example, the LCFG resource value) are decoded using the HTML::Entities module but only for LCFG profile version 1.1 and newer. Now we probably haven’t supported anything older than this for a very long time but it occurred to me that rather than just drop the version check it would be better to completely enhance the attribute decoding. So, rather than have calls to HTML::Entities::decode all over the place we now pass the value through a new DecodeAttrValue function which in turn calls a new NeedsDecode function to check if decoding is required. These are both small easily testable functions so I added a few tests along the way. The big benefit here is that if we now ever need to change the encoding/decoding of values and increment the profile version we are already prepared for the necessary code modifications.

The second big change was to improve the code of the InstallDBM function. This had two copies of a complex regular expression used to parse a fully-qualified resource name (e.g. host.component.resource_name) so I moved this code into a new function named ParseResourceName. Again this is now easily reusable and testable whereas before it was buried in the midst of other complex code. This led to some other improvements in how the debugging was done, I noticed there were many calls to KeyType which was just returning a prettified name for the underlying attribute type indicators which are all single characters (in the set [#%=^]). Each debug statement was very similar but handled a slightly different case, these were all merged into a ResourceChangesDebug function. This new function massively improves code readability and also improves efficiency since it only actually does something when the "changes" debug option is enabled. By reworking the debugging it is now possible to use the KeyType function in a totally generic manner. Anything which needs to know about the type of the attribute can work with the named versions rather than the almost-meaningless single character indicators.

There is still a lot more to do on this module to really improve the code standards but much of that might well be beyond the scope of this initial code cleanup project. The XML profile parsing and the DB handling are particularly in need of attention.

LCFG Client Refactor: tidying LCFG::Client

April 8, 2013

The first round of tidying code to satisfy perlcritic was focussed on the central LCFG::Client module which contains code used by all the other modules.

As well as the tidying there were a couple of slightly larger changes. I had spotted that several routines (RPMFile, DBMFile and ProfileFile) were each doing their own mangling of the host FQDN and then doing similar work based on the results. To reduce duplication I introduced a SplitFQDN function which contains an improved version of the hostname splitting functionality (and which can now be used in other places). I then also introduced another new function (named HostFile) which contains all the other duplicated functionality between the 3 functions. Each of the 3 functions are now pretty much reduced to a single line call to HostFile with the relevant parameters set. At the same time as adding these new functions I added tests for them and also the higher-level functions which use them. As each has now been reduced in complexity it is much easier to test them. This gives me a good guarantee that if I have to make changes in the future they will continue to work as expected.

Beyond tidying the code to resolve the worst of the perlcritic issues I also applied a number of changes which come from the lower levels. In particular I removed a lot of unnecessary brackets associated with calls to built-in functions and conditionals. This might seem like a tiny thing but it does reduce the “noise” quite considerably and vastly improves the code readability. I also removed all uses of the unless conditional, this is something which drives me crazy, anything more than an utterly simple condition is very hard to comprehend when used in conjunction with unless. That is one feature I really wish was not in Perl! I’ve seen unless-conditions which are so complicated that only a truth table can fathom out what is going on…

Another code smell which was eradicated was the heavy usage of the default scalar variable ($_). In my opinion there is no place for using this in large code bases outside of situations like code blocks for map and grep. Using it pretty much guarantees that there will be the potential for weird, inexplicable action-at-a-distance side-effects in your code.

One thing I would like to spend more time on at some point is improving the naming of variables. There is frequent use of single-letter variable names ($c, $t, etc) which is mostly meaningless. This might not be a problem in a very short (couple of lines) block where the context is clear but in chunks longer than a screen-full it’s really hard to track the purpose of all the variables. There is also quite regular reuse of variable names within a subroutine which again makes mentally tracking the purpose of each variable very difficult.

LCFG Client Refactor: perltidy and perlcritic

April 5, 2013

The next phase of the project to clean up the LCFG client (goals 3, 4 and 5) is to run everything through the perltidy tool and then fix the code to satisfy the perlcritic code checker down to level 4. Having all the code consistently indented with an automated tool may seem like a trivial thing to do but it makes such a difference to the maintainability of code. I realise that Python coders have been going on about this for years so it’s nothing new… We chose a coding style for the LCFG server refactoring project and I am using the same for the LCFG client. At the time we added a few notes on the LCFG wiki PerlCodingStyle page. I guess I probably ought to upload my .perltidyrc configuration file to that page so that it can be easily reused.

The use of perlcritic to check code is probably slightly more controversial for some people. It checks your code against a set of rules and recommendations originally laid out in Damian Conway’s book Perl Best Practices. If you don’t like some of those rules you are going to find it very annoying. We’ve found that aiming to satisfy levels 4 and 5 (the most critical issues) results in a vast improvement in code quality. Below that you very rapidly get into a lot of tedious transformations not all of which give you any great benefit. Knowing when to just ignore the moans of the tool is a very useful skill.

LCFG Client Refactor: rdxprof finished

April 5, 2013

The work to cleanup rdxprof is now pretty much finished. All the functionality has been moved out into the LCFG::Client module so that all that happens in the rdxprof code is 3 simple calls to subroutines in the core module:

  1. SetOptions – Parse the command line parameters and sets LCFG::Client variables
  2. Init – Initialises the environment (mostly just ensuring certain directories exist)
  3. Run – This does the actual work (either OneShot or ServerLoop)

There is still a small number of dependencies on global variables that would be nice to remove in the future but nothing critical for now.

This concludes goals 1 and 2 on the project plan. The hope was that this would only take one day of work but it ended up needing 2 days. That is due to my not having initially spotted the real degree of peculiarity of the coding style. The rdxprof code was definitely much more complex in terms of how it approached the “structure” of the entire program than anything I had encountered in the LCFG server code refactoring project. Hopefully now that particular intricate unpicking job is complete the rest will be more straightforward.

LCFG Client Refactor: rdxprof cleanup

April 2, 2013

The refactoring of the LCFG client has been continuing at good pace. I have now managed to move all the subroutines from the rdxprof script into the LCFG::Client module. This means that it is now possible to add unit tests for the functionality in these subs and I spent a bit of time yesterday adding the first simple tests. There are a lot more to go but it’s a good start. Adding the tests really helped me load more of the code into my brain so there are more benefits than just having testable code.

The big job for yesterday was really improving the sanity of the global variables. Some of the module code relied on the fact that it was being called from rdxprof to be able to access global variables declared in that script. Thus those modules wouldn’t work properly when loaded individually. In one case (the $root variable) it was declared as a global in two places and used as a local variable in many subroutines when passed in as an argument, that’s just a recipe for utter confusion. I’ve now removed one global but there is clearly a need to improve the situation further.

I also moved all the uses of values which are hardwired at build-time using cpp-style macros (e.g. @FOO@) into readonly variable declarations at the top of the modules. This makes it much more obvious which hardwired strings are required by each module. This is a first step towards replacing this approach with a configuration module (e.g. LCFG::Client::Config) which is how we handled the situation for the LCFG server.

SQL, ipython and pandas

March 30, 2013

I recently came across a really handy module which makes it easy to access data stored in an SQL DB from the ipython shell. Turns out that then going the next step and moving the data into pandas is very easy. All very cool, I love how easy it is to hack out code for quick data inspection and manipulation using ipython.

Refactoring the LCFG client

March 29, 2013

The time has come to start work on refactoring the code base of the LCFG client. This has been overdue for a while now as the current state of the code is preventing us from doing much in the way of new developments for fear of breaking something important. The aim is to tidy and clean the code to bring it up to modern Perl standards and generally make it much more readable and maintainable. The aim is to avoid altering the functionality if at all possible although a number of small bug fixes will be tackled if time allows. The full project plan is available for reading on the devproj site. This project incorporates many of the lessons we learnt when we refactored the LCFG server code last year, again see the devproj site for details.

I made an initial start on the project today. As with all refactoring the best first move is to ensure you have some tests in place. In this case I just added simple compilation tests for each of the 5 Perl modules involved. Interestingly this immediately flagged up a genuine bug which existed in the code, this was related to the use of subroutine prototypes. Now anyone who has a reasonable amount of experience with Perl programming will tell you that subroutine protoypes are evil, full of gotchas and rarely do what you expect. One of the tasks in the plan is to rid the code of them entirely but that’s not for today. Thankfully this was a simple error where the prototype stated that 4 scalars were required when, in actual fact, only 3 were needed (and only 3 were provided when the subroutine was called). I’m surprised the code actually worked at all with that bug, this shows how useful even simple testing can be for improving code quality.

The whole code base is basically 5 Perl modules and a script which uses them all. An interesting strategy was taken with the module loading, all subroutines from the modules were imported into the “main” namespace of the script (which is effectively global) and then all calls to them anywhere in the code base were referred to the version in that namespace. So, all subroutine calls were done with unqualified, short names, I guess this makes it quick to hack out but coming at the code without a huge amount of prior knowledge it is almost impossible to quickly reckon the source location for each subroutine. So, my second step was to work through all the code and replace the calls with fully-qualified names. To make it doubly clear that the old way wasn’t readable or maintainable I also ripped out (the now unnecessary) support for exporting subroutines into another namespace and ensured that when these modules are loaded there is no attempt to import anything.

This sort of change should be zero impact, right? Turns out, not entirely, nothing is ever simple… I had to shuffle a few subroutines out of the script into the helper modules, in turn that meant fixing a few references to global variables. This in turn required passing another parameter to a couple of subroutines which meant hacking out a few evil subroutine prototypes. I think that shows up a few code smells which will have to be tackled very soon.

Before I can really get stuck in though a few more tests are going to be necessary. At the very least there is going to have to be a test of the client’s ability to download an XML profile and convert it into the locally stored file format. At this stage I don’t know enough about the code to create tests for each subroutine so a large-scale test of functionality is the only option. Without that test it won’t be safe to make any bigger code changes.

SSH honeypots

March 27, 2013

I’ve never been brave enough to run an SSH honeypot myself. Anything which is even pretending to be “open” to the world to attract bad guys is probably just too much of a risk for a network. Having said that, it’s clear there is a lot of interesting data which could be gathered and a lot we could learn about the standard approaches to system compromise attempts. I recently came across a fascinating blog article which reviews the data captured using a honeypot. It gives some insight into how these attacks are carried out and clearly shows that most of them are “script kiddies” without much clue. As my recent talk (Do bad guys work weekends?) presented at the FLOSS UK Spring Conference in Newcastle showed, there are some very simple strategies for completely blocking most of these attacks.

Using Python and Pandas to process data

March 16, 2013

I’ve recently been doing some data analysis for a presentation I will be giving at the FLOSS UK Spring Conference in Newcastle next week. This involved processing a lot of data gathered from our syslogs related to SSH authentications. As part of my ongoing effort to learn Python properly I decided to do all the work in that language. Whilst hunting around for useful modules for processing data and calculating various statistics I came across the very clever Pandas library which provides some impressive tools for processing tabulated data (such as that in CSV style files). It’s a bit of a steep learning curve but I’ve just come across a neat blog article which summarises the main functionality quite well. I’ve only used a few of the features so far, I particularly found the groupby functionality very handy, I shall definitely be exploring this library further in the future.

DBI, Postgresql and binding values for intervals

January 31, 2013

I have spent most of the day scratching my head over this one. I have a PostgreSQL database which has an “events” table, within which there is a “logdate” column (which is a simple date field). I need to be able to run an SQL query which gives me all events older than a certain length of time (e.g. an interval of so many days, weeks or months). The standard way to do this is to prepare a query with a “placeholder” into which a string supplied by the user can be safely inserted.

I am using Perl for this project and initially I was trying this with the excellent DBIx::Class module (which uses the SQL::Abstract module internally). The weirdness of the problem had me convinced there was a bug somewhere so I spent a while upgrading everything to the latest versions in the hope that the issue would disappear. When this did not help I reduced the problem to the simplest version possible using the standard DBI interface, like below:

use DBI

my $dbh = DBI->connect("dbi:Pg:dbname=buzzsaw;");
$sth = $dbh->prepare(
     "select count(*) from event
         where ( logdate < current_date - interval ?)");
$sth->execute("26 weeks")'

but this still produces an error message like this:

DBD::Pg::st execute failed: ERROR:  syntax error at or near "$1"
LINE 1: ...(*) from event where ( logdate < current_date - interval $1)
                                                                    ^ at -e line 1.

At this point it became clear that this is a “feature” or, at the very least, the lack of a feature rather than a bug. A bit of hunting around revealed this ancient Debian bug report from September 2006. Thankfully the bug report does contain a work-around which is to use the placeholder as a string and then cast it to an interval like this:

use DBI

my $dbh = DBI->connect("dbi:Pg:dbname=buzzsaw;");
$sth = $dbh->prepare(
     "select count(*) from event
         where ( logdate < current_date - ?::interval)");
$sth->execute("26 weeks")'

Given the grief this caused me I thought it worth committing to my blog in the hope that the next person to hit this issue will find the solution quicker than I did. This is definitely a lesson in why the best approach is to reduce a problem to its simplest form rather than just assuming something has a bug.

Is it worth running fail2ban?

October 12, 2012

Part of the standard security advice for anyone running a machine with an SSH daemon which is open to the world is to install the fail2ban software to block brute-force attacks.

In Informatics we use it to monitor various log files for login failures. When more than a certain number of failures are seen from a single source address within a short period of time we deny access to that address for a while. This is done using basic tcpwappers rules (i.e. hosts.deny). Since we do this on all hosts which have holes in the firewall that allow incoming SSH connections it’s note to easy to tell exactly how much good this is doing. The question is, without these blocks would the attackers go away after a few failures anyway?

Recently we had an opportunity to see exactly what does happen when you open SSH to the world for a machine for the first time and then do not run fail2ban. At about 10:50 on 4th October a new firewall hole was opened to allow incoming SSH connections to a machine. At 15:12 we see the first login failure in the logs, by the end of that day we had 478 login failures. Here are the stats for the following days:

Day Failure Count Total Failures (all hosts)
Thursday 4th October 478 2048
Friday 5th October 2015 3510
Saturday 6th October 36 1473
Sunday 7th October 1323 2810
Monday 8th October 100 1702
Tuesday 9th October 36542 38296
Wednesday 10th October 20093 21714
Thursday 11th October 3455 5033

We do regular monitoring of the failure counts for all our hosts so the sudden increase, by an order of magnitude, in failure counts set the alarm bells ringing fairly quickly.

An interesting question is whether all these failures are coming from single hosts or a wide range of addresses, i.e. are the attacks coming from botnets? Here’s the counts for each different source address for the two peak days:

  1. 30415
  2. 3211
  3. 2434
  4. 320
  5. 87
  6. 48
  7. 21
  8. 4
  9. 2
  1. 9510
  2. 6852
  3. 3171
  4. 488
  5. 50
  6. 12
  7. 10

So, the attacks are coming in large numbers from just a few specific machines.

It’s also interesting to look at the top user names which all these attacks are trying to compromise, here’s all user names with more than 150 attempts.

userid count
root 29065
test 1054
oracle 887
nagios 743
admin 634
user 449
mysql 398
guest 384
postgres 329
www 318
testuser 298
temp 273
backup 256
support 251
tomcat 239
web 234
ftpuser 234
mythtv 188
webmaster 185
teste 169
apache 159
bin 155

This demonstrates two particular issues.

Firstly, you should never allow root SSH logins, in fact 45% of all login failures were for the root account, with openssh you should always set the PermitRootLogin option to no.

Secondly, most attacks were against “system” accounts. All of those with 150 or more failures were for accounts which are not used by real live users – they are for daemons, system utilities or testing accounts. To avoid any of these accounts being compromised you should restrict login access to some group which only contains real users, this is done using the AllowGroups option in openssh.

This clearly shows that running fail2ban does result in a big reduction in the number of attacks we see each day. With fewer opportunities to attempt to login the chances of successfully cracking a password by brute-force and seriously reduced. Also, a few simple tweaks to the openssh daemon configuration which will not affect the experience of normal users results in a great improvement in security.

Securing the network

December 20, 2011

I recently came across a blog article on “9 security controls you should add to your network right now“. I think this neatly summarises some good ways to enhance the security of a network and provide the ability to detect when an intrusion has occurred. I particularly think the idea of blocking outgoing traffic from many servers (e.g. web servers), preferably with alarms being sounded automatically when something attempts to make external connections, makes a huge amount of sense.

New SSH trojan?

November 16, 2011

I’ve seen some suggestions that there is a new SSH trojan doing the rounds, see this blog article for some details.

F13 end-of-life

June 15, 2011

The end-of-life for F13 comes on the 24th June 2011, (see this reminder email for details). As of today we have a “stable” DICE SL6 platform so we can now start replacing all those F13 machines. I’m looking forwards to only having two supported platforms again (SL5 and SL6), especially with the Fedora approach to updating a “stable” release being mainly to just chuck in everything and the kitchen sink.

Traits of a Unix admin

February 19, 2011

I recently came across an excellent article on the “Nine traits of the veteran Unix admin”, it’s all so true…

Moose role hackery

February 8, 2011

For quite a while now I have wanted to have the ability to apply a Moose role without actually loading any of the associated code. I’ve finally come up with a solution which seems to do exactly what I need.

For a bit of background, Moose roles are what is often referred to in object-oriented programming as “interfaces”. They are used to declare that a class is guaranteed to implement a particular behaviour (attributes and/or methods) as part of its API.

A commonly used role which is available on CPAN is MooseX::ConfigFromFile which is used to declare that a class has the new_with_config and get_config_from_file methods. These are used to set the values of attributes from a configuration file. This works well in conjunction with other roles, such as MooseX::Getopt, which can detect when the ConfigFromFile role is implemented and load the values for attributes from either the command-line or a specified configuration file.

The problem is that the MooseX::ConfigFromFile code is a little bit peculiar and has a dependency on MooseX::Types::Path::Class (and thus MooseX::Types and Path::Class amongst others) which are not usually essential and lead to memory bloat for no good reason.

So, here is my solution, add these two lines:

my $extra_role = Moose::Meta::Role->initialize('MooseX::ConfigFromFile');

I can use this to state that my own configuration loader module does everything that the MooseX::ConfigFromFile role requires but I do not need to load (or even have installed) the MooseX::ConfigFromFile module file itself. This seems to work equally well when applied to a role or a class.

Sending email from Perl

January 28, 2011

#include <long-time-no-blog.h>

I often need to send emails from Perl scripts and over the years I’ve tried all sorts of modules which are supposedly the current “best practice” but rarely do they get even close to living up to the hype. Recently, however, I came across this article which introduced me to the MIME::Lite::TT module which genuinely does seem to be very nice and easy to use. Here is a snippet of how I am using it in the Package Forge build daemons to send status messages:

    my $msg = MIME::Lite::TT->new(
        From        => $from,
        To          => $to,
        Cc          => $cc,
        Subject     => "PkgForge Report - $builder - $job",
        Template    => $template,
        TmplParams  => \%params,


The templating is done with the Perl Template Toolkit (TT). Handily, the template parameter can be either a reference to a scalar (i.e. the text is embedded in the code) or a filename. Also TT can be configured to work in whatever way you require by passing in a reference to an options hash as well as the params hash reference.

Most of the power is in the MIME::Lite Perl module which can easily handle all types of attachments and can send mail by various different methods if local sendmail is not appropriate. I can’t immediately spot anything it cannot do to meet my needs, particularly when extended to include the TT templating support.


April 16, 2010

Recently we’ve been looking at ways to improve the security of our ssh login machines. From looking at our logs one thing that has become apparent is that we get endless brute force attacks searching for accounts with weak passwords. Now in an ideal world those weak passwords wouldn’t exist, of course, but we have to live with the fact that some people are better than others with their password choices. Also we certainly shouldn’t be making it easy for the attackers and servicing all their requests, if nothing else it raises the load on the servers which potentially disrupts our real users.

I’ve used fail2ban on my personal servers for a number of years without any major problems so I decided to take a look at using it for our ssh servers. The basic concept is that you have a filter which is applied to a set of log files and some actions which are carried out when failures over a certain threshold are found. This can work with any service, e.g. ssh or apache, fail2ban comes with a set of prepared filters and actions but it is relatively easy to add more.

For Informatics we are going to use one of the simplest configurations which is to watch the auth log and then put entries into /etc/hosts.deny when potential attacks are spotted. In the current configuration we consider an attack to be 5 failures to login within an 10 minute time period, that IP address then gets blocked for an hour. All addresses in the 129.215 block are whitelisted to reduce the chances of us locking out our own machines. Hopefully this won’t cause too many problems for real users who’ve forgotten their password. It’s easy enough to drop an IP block though when done this way. An alternative approach might have been to use iptables but that adds complexity and management overhead.

To help with the management of fail2ban I’ve written an LCFG component. At the moment this can just setup the main configuration file and basic jails, I doubt it would be too much work to extend it to handle new actions and filters if necessary.

Petabyte Scale Storage

March 20, 2010

I’ve never come across the Ceph filesystem project before but it is Open Source and it has just been merged in for the 2.6.34 linux kernel. It claims “Ceph is an open source distributed file system capable of managing many petabytes of storage with ease.” It looks like there is still quite a bit of development work going on but it could be very interesting in the future if it manages to fulfill all its goals.

Configuration Languages

March 19, 2010

I spotted this blog post about using (or not) domain specific languages to customise programs. I can’t help feeling there is an interesting overlap here with the way we configure entire systems, we all face similar problems they are just at different levels. Just because a piece of software can be configured using the full power of Perl doesn’t make it a good thing (yes, I’m looking at you, RT…). LCFG deliberately has a minimal “language” for this very reason, it offers far fewer ways in which people can shoot themselves in the foot (as long as we ignore cpp).

F12 ntp

March 19, 2010

To keep kerberos happy you need your client machines to have their clocks fairly well synchronised with the KDCs. The easiest way to achieve this is to use ntp. I’ve added an LCFG header, inf/options/ntp.h which uses the file component to do a simple setup on F12. The file /etc/ntp.conf now just contains:

driftfile /var/lib/ntp/drift
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict -6 ::1
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys

As usual, one of the problems with the file component is that it cannot restart services after a configuration file has changed. So, once this is in place it is necessary to do /etc/init.d/ntpd restart.

If ntpd was not previously running (you can check first) then it is necessary to use chkconfig to activate the service:

# chkconfig --list ntpd
ntpd            0:off   1:off   2:off   3:off   4:off   5:off   6:off
# chkconfig --level 2345 ntpd on
chkconfig --list ntpd
ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off

openafs on F12

March 10, 2010

Moving straight on from getting kerberized logins working it’s time to get openafs running. The packages for F12 are all pre-compiled and the official repository supports yum so that’s the easiest approach. Here is the yum repository config file (openafs.repo) for 1.4.11:

name=Openafs 1.4.11 for F12

Once that is in /etc/yum.repos.d, it is possible to do:

yum install openafs-authlibs openafs-client openafs-server openafs-krb5 openafs-docs

It is also necessary to grab a kmod-openafs package, for some reason I have experienced problems with the automatic support for this on F12 so it’s best to grab the correct version of the RPM for the running kernel from the openafs website and install it manually.

Once that is done:

echo > /usr/vice/etc/ThisCell
/etc/init.d/openafs-client start

it might be necessary to also edit /etc/sysconfig/openafs. I made it contain:

AFSD_ARGS="-dynroot -afsdb -fakestat -daemons 5 -volumes 200 -chunksize 20  -nosettime"

Network, Kerberos and openssh on F12

March 10, 2010

Firstly we need to deactivate the nastiness that is NetworkManager and switch to configuring the network interface so that it comes up at boot time and uses DHCP to get an address and DNS configuration. This is done by using the system-config-network tool as root and doing an “Edit” on the eth0 device. After finishing the alterations the networking needs restarting with /etc/init.d/network restart

To make sure this continues to work after a reboot, as root, do:

chkconfig --levels 2345 NetworkManager off
chkconfig --levels 2345 network on

As well as this, to get the machine to have the correct hostname and domain name, I had to edit /etc/hosts to look like:   localhost
::1         localhost bowmore

and set the domainname like:


The next step is to start using kerberos for authentication and LDAP for user info. That is done using the system-config-authentication tool, again as root. The LDAP base DN is dc=inf,dc=ed,dc=ac,dc=uk and I used ldap:// for the server.

For kerberos authentication the realm is INF.ED.AC.UK, I didn’t list any KDCs but rather ticked both options to use DNS. The admin server is

If you don’t have AFS available then on the “Options” tab you probably want to select “Create home directories on first login”.

I can never be bothered with typing in my password all the time so the next step is to get kerberos up and running and then configure openssh appropriately. Nicely Fedora finally includes all the patches provided by Simon which we have been applying locally for years so no rebuilding is necessary.

The next step is to grab the hostclient and host principals for the specific machine. If it is a new machine you will need to create it first, if it already exists then (as root) you can do something like:

kadmin -p squinney/admin \
            -q 'ktadd -k /etc/krb5.keytab host/'
kadmin -p squinney/admin \
            -q 'ktadd -k /etc/krb5.keytab hostclient/'

You can now configure openssh to work like a normal DICE machine. You will need to copy over /etc/ssh/ssh_config and /etc/ssh/sshd_config to your f12 machine. Note that the daemon config file is only visible by root. After reconfiguration restart the sshd.

It should now be possible to ssh in without a password!

Starting on F12/x86_64

March 10, 2010

I’ve made a start on the F12/x86_64 port. The first thing I did was to install from the F12 CD and make a base packages list:

 rpm -qa --queryformat '%{NAME}-%{VERSION}-%{RELEASE}/%{ARCH}\n' \ 
| perl -pe 's{/x86_64$}{}; > lcfg_f12_64_base.rpms

On this platform, at this stage, there are no packages with architectures other than x86_64 and noarch so I did not have to worry any more about getting the formatting correct.

The next stage was to get yum working with our local repositories:

su -
perl -pi -e 's/enabled=1/enabled=0/' /etc/yum.repos.d/* /etc/yum/pluginconf.d/presto.conf
cd /etc/yum.repos.d/
yum check-update

This deactivates any existing repositories in use and turns off the presto plugin which does the delta-rpm stuff which we do not need.

I have put together a yum configuration file for our Informatics F12 repository. Note that, by default, only the base directory is enabled. This makes it possible to easily install extra base packages with yum and know that the changes are directly applicable to the LCFG F12 base package lists. At a later point when updaterpms is installed and being run the updates can be applied.

openafs project work

February 25, 2010

This month has seen a flurry of finishing touches added to the LCFG openafs component to make it even more useful and robust and it finally reached the point where it was ready for full service. This morning I took great satisfaction in ripping out the heart of the final manually-configured AFS DB server which was still running on an ancient desktop machine. All 3 of the Informatics AFS DB servers have now been moved onto modern server hardware and are fully LCFG managed. This basically brings to an end my involvement with the current AFS project. Over the last few days nearly all our AFS clients have switched to the new component and half of the file servers have now been done. Some time soon I’ll write up a full report on the work I have done over the last 10 months to deliver a new LCFG openafs component. I love it when a plan comes together…

F12 network configuration

February 17, 2010

I’ve been struggling to get an F12 machine installed and working with network logins enabled for a couple of days. I think I have finally worked out what is causing my troubles. F12 uses networkmanager to start networking but that only happens after a user has logged in. That is clearly going to cause problems when you need the network up first to authenticate/authorize the user… You would think that when an installer offers the opportunity to configure network logins it would have the intelligence to switch to starting networking in the boot process. The solution is to boot single-user and run system-config-network and configure eth0 to use dhcp. This means hitting the tab key as soon as grub starts and then editting the boot command to add a -s to the boot prompt. You might also want to remove some of the graphical boot gubbins to see what is happening. A quick reboot and it should all be working.

List::MoreUtils unexpected behaviour

February 16, 2010

I’ve long been a fan of the List::MoreUtils perl module so I don’t know why this “feature” has never bitten me before. The module provides a number of functions for manipulating lists, in particular I was using any() and none().

any() returns a true value if any item in LIST meets the criterion given through BLOCK, for example:

 print "At least one value undefined"
               if any { !defined($_) } @list;

none() is logically the negation of any. It returns a true value if no item in LIST meets the criterion given through BLOCK, for example:

 print "No value defined"
               if none { defined($_) } @list;

The gotcha here is that both of them will return an undef value when the list is empty. It’s not such an issue with any() but this particularly caught me out when using none() as I was expecting it to return true when the list was empty. To my mind it really should return true as an empty list definitely doesn’t contain any elements which match the condition. Surely other people have had the same experience. In future I think I will stick to using the standard grep() and just get the size of the list returned.


February 10, 2010

I guess a lot of people have gmail accounts and thus nearly everyone now has access to Google Buzz. At a first glance it seems quite nice and involves a lot less rubbish than Facebook. The question is whether yet another social interaction time-waster is going to get people particularly excited.

openssh and kerberos

February 9, 2010

At home I use Ubuntu for my various machines and I’ve now hit this problem a couple of times so it is probably worth detailing it here in case it affects anyone else. Before logging in to an Informatics machine with ssh I prefer to kinit to get my INF.ED.AC.UK principal into the ticket cache. Amongst other things this means I don’t need to keep typing in my password whenever I use ssh. This all works nicely but if you do not have an ssh client configuration file (.ssh/config) you can login but then will have no kerberos tickets or AFS tokens on the Informatics machine. This results in a very weird experience where you think the server is bust but everyone else can use it just fine. The solution is to add something like this:

Host *
  User squinney
  GSSAPIAuthentication yes
  GSSAPIDelegateCredentials yes

Obviously you will need to change your username appropriately.

LCFG Server updates

February 8, 2010

It’s been a while since I touched the stable release of the LCFG server code but we’ve accumulated a few bug reports that need some attention so I’ve taken the chance to deal with them. In particular, "Allow dumpdeps to be run by non-root" is fixed and "servername resource not reliable" is partially fixed. I’ve also taken the chance to backport a patch from the development tree which improves the release name handling, previously if a source profile did not have a release specified then it would be set to "default" but that value did not make it into the XML profile. This didn’t seem to cause any major problems but it resulted in odd resource values as any resource mapping (e.g. <%profile.release%>) did not get the required value embedded. The test suite also needed a bit of attention as it turned out that the profile.def schema being used for the tests hadn’t been updated since before the profile.release resource was added. If all goes well with the testing there will be a new release – 2.2.55 – out fairly soon. As well as the old test suite being used this will be the first stable release which also has to pass the XML profile comparison tests from the new development tree.


January 25, 2010

This is as much for my own reference as anything else. Occasionally we have problems with NFS servers going AWOL and this can leave broken mounts which are impossible to properly remove. This would not be too much of a problem if it wasn’t for the fact that rpm checks every file system before it starts doing anything. I guess this is to look for disk space or read-only partitions, I’m not convinced this is the job of a package manager but… If an NFS mount is broken then rpm will hang indefinitely, if you run rpm every night then you soon get lots of hung rpm commands that need clearing out. The trick is to kill all the running rpm commands then use the ‘-f’ and ‘-l’ options on umount, (this results in it being invisible to the RPM command) then run rpm again and all is well.

UKUUG Spring Conference

January 14, 2010

I’m now all booked for the UKUUG Spring Conference which will be held on 23-25 March in Manchester this year. The schedule is not up yet but the list of talks suggests that it might be quite interesting.

One down, two to go

December 14, 2009

This morning the first of three Informatics AFS DB servers was successfully switched to a new machine. It’s great to finally get it off the ancient Dell GX270 we were using but more importantly this is using the new LCFG openafs component. We now have pretty much full control of that DB server using the LCFG component and we have nagios monitoring!

New openafs component

December 3, 2009

As part of the AFS project I have been working on a new openafs component which can manage the configuration of our AFS clients and servers. The development part of this work is now almost complete. The final stages remaining involve switching to new DB servers and then converting the file servers and all the clients to the new component. This should all happen in the next couple of months.

As a testing phase I am planning to convert all the Informatics ‘develop’ machines to the new component before Christmas. I’ve been using it for quite a while on some of my desktops without problems. If you want to get
ahead of the game you can put this macro at the top (before any #include) of an LCFG source profile:


Once the new profile has been processed by the LCFG client on your machine you should then reboot. If all goes well you won’t really notice any differences! Please check /var/lcfg/log/openafs and report
anything in there which you think might be odd/wrong.

For the interested there is thorough documentation of the resources in the lcfg-openafs manual pages.

I’ve also put together some documentation on how to use the new component to configure AFS servers:

I’ve also put together a plan for the first switchover of a DB server, which is planned to happen at 8am on Monday 14th December.

All comments/questions/suggestions on the new component are welcome.

Limiting operational work

November 23, 2009

In MPU we have found that our day-to-day operational work seems to consistently take up about a third of our available effort. We would really like to reduce this and spend the time on more development work and possibly even fit in some personal development… We’ve tried various approaches but as yet there has been no big reduction. Our new idea is to limit the days each week on which we can do operational work. I will be doing Monday and Tuesday, Chris will be doing Thursday and Friday, we will have emergency cover only on Wednesday. This puts a limit on the time available in any week for operational work of 40%,that is clearly more than we are aiming for but hopefully we won’t spend all of our 2 days on just the operational work. Wednesday morning tends to be a meeting most weeks so that leaves 2.5 days to be devoted to nothing but development work. The hope is that this will help us prioritise our operational work and avoid lots of distractions when we are developing new stuff. We will have to see how it pans out.

LCFG Server tests

September 2, 2009

I’m still working on testing the output from development versions of the LCFG server by taking a known input and generating XML profiles which can be compared against known “good” output. The test suite seemed to be mostly running fairly well on my test server, telford, but I’ve been seeing a few oddities related to the last-modified-file which I couldn’t explain. This afternoon I decided to try running an experimental version of the server on my desktop instead and I couldn’t get it to do anything sensible at all for ages. Eventually I tracked this down to a couple of symlinks in the input “releases” directory for the develop and default releases which were absolute paths. This worked on telford as that is acting as a full test LCFG slave server and has all the data directories installed but, of course, didn’t on my desktop. Changing those links to relative links now gives nice predictable output. I’ve updated my input data collection scripts so it won’t happen again. I think there’s a lesson here about being careful over where tests are run to be sure they are really doing something useful.

Comparing LCFG XML profiles

August 31, 2009

Recently I have been having lots of “fun” working out how to compare LCFG XML profiles generated by different LCFG servers to see if they are functionally equivalent. As a first step I removed the contents of all the nodes which are obviously server-dependent, these are: published_at, published_by and server_version. This really is only the beginning of the job though, nearly every component and package node has a derivation attribute which holds a list of paths that are dependent on the paths to the server input directories. I came up with a cunning scheme to reduce these paths to the shortest form, this takes in all the lists of directories involved, sorts them by depth so that the most specific is removed and converts them into regular expressions to handle the release and host name format strings.

After this I really thought I had cracked the problem but this turned up some issues with the LCFG server which lead to making a code change. It turned out that the LCFG servers generated lists of nodes for spanning maps in an unpredictable order which varied between hosts. This doesn’t really bother the clients using the data but it doesn’t fit well alongside normal LCFG taglists where the order is considered important and is intended to be maintained. The result is that we now sort the taglists generated from spanning maps when they are added to a subscribing profile.

Again, having thought I had the problem solved more issues have turned up today. It turns out that I also need to canonicalise the file path in the last_modified_file node in a similar fashion to the derivation attribute values. A more annoying issue though that has appeared is that when the value for this node could come from one of several files with identical timestamps it doesn’t seem to be possible to predict which file will be selected. I feel more code changes in the LCFG server are now required…

Testing the LCFG Server

July 2, 2009

The first stage of the LCFG server refactoring project is to develop a test suite to ensure we don’t introduce any bugs or changes to the overrall behaviour. This will be based around comparing the generated output from the new and old server code given the same input and configuration data. I’ve put together a wiki page which collects the various ideas and thoughts.

Blocking user poweroff from gdm and gnome

June 5, 2009

We recently had a request from the User Support Unit to block users from doing a shutdown on machines in meeting rooms. The reason behind this is that the machines themselves are stored in locked cupboards, once they are powered off you need a key to open the cupboard and press the power button which is rather inconvenient.

At the same time we still wanted to allow users to be able to do reboots as a last resort when things go wrong so we could not just block all access to the shutdown command.

There are command line tools named “poweroff” and “reboot” for which access is controlled through consolehelper and thus PAM. I modified the PAM config for poweroff to block everyone who does not have system administrator privileges. However, this does not prevent users doing a shutdown from the gnome system menu. I hunted around the web for quite a while for any sort of solution to this or hint as to how gnome is actually sending that poweroff request. Eventually I discovered the little known fact that if you remove the gdm system menu, to prevent reboot and shutdown requests from the login screen, the shutdown option magically disappears from the gnome system menu. This probably does not prevent the determined user who really wants to shutdown the machine but it will stop all the people who select shutdown when they meant to just logout.

websvn diffs

June 5, 2009

I spent a while this morning trying to work out how to get websvn to show me specific diffs for a file. For the record you need a URL like this one.

Replace /trunk/lcfg-om/om.cin with your chosen path in both places and then put your two different revision numbers.

om Improvements

June 5, 2009

Recently I have been working on extending the functionality of om, the tool used to invoke methods on LCFG components. Before adding new features I did some code review and tidying, which I have now documented.


April 2, 2009

To quote directly from the remctl documentation: “remctl (the client) and remctld (the server) implement a client/server protocol for running single commands on a remote host using Kerberos v5 authentication and returning the output”

I have been intending to find the chance to try out remctl for a while now as it looks like it could be very useful. In particular it should allow us to run nagios passive checks (e.g. for disk space usage) in a secure manner. It could also provide an improved method for remotely executing commands compared to the current way “om” just does a login using ssh.

Simon had already written an LCFG component which supported a lot of the necessary configuration so I took this work and finished it by adding support for command ACLs. To install it onto a server you now just need:

#include <lcfg/options/remctld.h>

On the client you will need at least the remctl package, you might also want the perl module but it’s not essential:

!profile.packages                       mEXTRA(remctl-2.13-1.inf\

Once you have installed the new packages on the server you will need to start (or restart) the LCFG xinetd component. To get it to do something useful you then need to add some commands, for example:

!remctld.aclgroups               mADD(foo)
!remctld.aclmembers_foo     mSET(squinney@INF.ED.AC.UK sxw@INF.ED.AC.UK)

!remctld.types                      mADD(om)
!remctld.services_om           mSET(ALL)
!remctld.exec_om_ALL         mSET(/usr/bin/om)
!remctld.aclfile_om_ALL        mSET(foo)

It’s not necessary to use groups of ACLs, you can define lists of allowed and denied users for each command. This approach just allows you to use the same ACL file for multiple commands.

To understand all of this requires some reading of the LCFG component docs and the remctl documentation but it’s hopefully fairly clear that this example would allow Simon and I to run om on that machine. Of particular benefit is the ability to allow specific users to run commands on a machine without giving them full shell access but still controlling the access in a secure manner. For example, a user could be allowed to restart a webserver (via om) although not allowed to login.

Userfriendly it ain’t

March 17, 2009

It has long been accepted wisdom that these mouse-driven office “productivity” applications are in some way intrinsically more userfriendly than the command line or text file driven applications we are more accustomed to in Unix. Today I had the opportunity to prove to myself that this just isn’t the case.

I had a file containing two columns of data, first column is a date, the second column is a number. All I wanted to do was convert this into a nice graph with the dates on the X-axis and the numbers on the Y-axis. It seemed to me that this is exactly why spreadsheet applications like those provided in openoffice and gnumeric were designed. When it comes to office applications I am a novice but I do expect them to be usable with a bit of common sense and intuition. But, oh no, no matter what I tried I just could not get it to do what I wanted. First I had a huge battle to persuade it that the date column contained dates. Then I couldn’t get the chart style I wanted, once I decided to opt for what it thought best I then couldn’t easily label the axes, then I couldn’t easily add a title. After a long while of cursing and mouse-clicking I had a chart which more or less resembled what I wanted but could I print it?? Nope. It was spread horizontally across two pages so would not print out on one a4 sheet without a rotation but there was no obvious way to do a rotation. At this point I gave up…

A long time ago in a career far, far away I was an astronomer. You might think that most of what astronomers do involves looking through telescopes. The reality is a bit more dull, they spend their days sifting and analysing enormous amounts of data looking for the proverbial needle in a haystack. In those days the application of choice for plotting graphs was Super Mongo (or SM for short). This did a good job but it was a pig to use so I got into using a newer project named gnuplot, it was a bit limited in some places but the interface was so much nicer than SM it was generally the tool I used. Knowing what many astronomers are like they are probably still battling on with SM rather than try anything different and new but then they like using Fortran as well…

I’ve not used gnuplot in years so I had forgotten anything I ever knew about the syntax. However, having given up on the not-so-friendly spreadsheet I decided to give gnuplot a try. 5 minutes with a tutorial from IBM and the gnuplot manual and I had exactly the graph I had originally wanted with no cursing and swearing involved. Here’s the whole script:

set terminal png         # gnuplot recommends setting terminal before output
set output "report.png"  # The output filename; to be set after setting

set xlabel "Week"
set xdata time
set timefmt "%d/%m/%Y"
set format x "%d/%m/%y"
set xrange ["01/11/2007":"31/03/2009"]

set ylabel "Hours"
set yrange [0:25]

set boxwidth 3

plot "report.txt" using 1:2  with boxes lw 3 title 'Build Tools Project Effort'

It could be a lot simpler depending on the data, some of that gubbins is associated with making it parse and print dates correctly. This is stored in a file (e.g. report.gnuplot) and then passed to gnuplot on the command line:

$ gnuplot report.gnuplot

You now have a file named report.png. gnuplot also has a very nice interactive interface where you can type in your commands and try out stuff. It has also gained many features over the years and has some impressive abilities, it will also save to pretty much any file format you like which is perfect for inclusion in LaTeX docs or using in web pages. What’s not to like? It’s simple, straight forward and well documented. It’ll be a long time before I make another attempt to use a spreadsheet.

LCFG Updates

March 13, 2009

It’s been a very busy week for updates to the LCFG web services. We have finally moved to svn over webdav for our headers & package lists repository and we now have the start of a move from CVS to subversion for the source code repository. All this means that we can finally involve external contributors, we can allow them read and write access to any of our projects and we can host their projects so that everything is in one central location. Accompanying this is the opening of access to the LCFG bug tracker so that external contributors can file bugs and manage bugs on their own projects. All this is possible due to the wonders of cosign and iFriend.

Splitting lcfg-utils

March 4, 2009

Recently I have been working on converting the MPU managed LCFG components to being built via the new build tools. I’m down to the very last few now, one that I have been avoiding for ages is lcfg-utils since the current package build scripts are a bit crazy, it is all in a bit of a tangled mess. Yesterday I finally bit the bullet and started ripping it apart with the aim of separating it out into related units – one for the C code, one for the core Perl utilities and one for the Perl package utilities. Along the way I also had in mind enabling the building of a shared library for lcfgutils and a few other niceties.

I was pleased to find that the new build tools really did make the job much easier than I had expected. The two packages of Perl code, LCFG::Utils and LCFG::PkgUtils use Module::Build so could be uploaded to CPAN. The newly pared down lcfg-utils package provides the shared library and a few basic utilities uses CMake. There is also an lcfg-utils-devel package on Redhat systems which holds the header file and static library for any build processes where that is required.

I now have it all nicely organised and ready for testing. I believe it all works, it certainly appears to on my machine but it will need further testing to check that I haven’t introduced any nasty bugs. These are fairly important libraries and utilities so a certain amount of cautious checking is required. If you want to give it a go you can do so with the following lines added to an LCFG source profile:

!profile.packages       mEXTRA(+lcfg-utils-1.3.0-1\

If you are feeling really brave you can also try out a new version of updaterpms which uses the lcfgutils shared library, you just need:

!profile.packages  mEXTRA(+updaterpms-3.1.5-1)

UKUUG Advanced Perl Workshop

March 2, 2009

Last Thursday I was in London to attend an "Advanced Perl Techniques" workshop organised by the UKUUG. The tutor was Dave Cross, who has written a couple of Perl books. He has a good style of delivery, he was generally very knowledgeable, the presentation was well structured and amazingly it all ran to time (that takes real talent). Given the title and the list of topics I had high hopes of learning some really cool new things to do with Perl. Here’s the list of subjects which were covered:

  • What’s new in Perl 5.10
  • Dates and times
  • Testing (including coverage analysis)
  • Database access with DBIx::Class
  • Profiling
  • Object oriented programming with Moose
  • Templates
  • Web application development with Catalyst

Specifically, I wanted to learn more about DBIx::Class and Catalyst and find out whether I am using Moose in the right/expected way. I guess, looking at that list again now, I should have realised that it is a lot to get through in one day and necessarily it was only going to be a shallow coverage of each topic. Other than the Catalyst stuff at the end I thought it was all pretty good (if lacking in the really deep detail I wanted) and I did get some useful bits and pieces from the day. I felt the Catalyst section was done very lazily though, it had the feeling of being added as an after-thought and I wondered if it was actually just copied from the standard Catalyst documentation.

I was interested to learn that "DateTime" is considered the "best" module to be using for all time and date manipulation. It certainly has capabilities way beyond that which I was previously aware. I also found the profiling section interesting, I will definitely be looking at "Devel::NYTProf" in more detail sometime soon. The "What’s new in Perl 5.10" section was also particularly good and has encouraged me to start looking at the new features in more detail and, at least, start using them in any scripts I write for personal use. It’s a shame we won’t see 5.10 in RHEL5 but that’s the price we pay for system stability. By the time we get RHEL6 it will at least have had any bugs found and fixed by users of Fedora, Debian and Ubuntu.

All in all, it was worth going to the workshop. At some point in the future I’d love to see a "Really Advanced Perl" workshop which really goes beyond the beginners guide for DBIx::Class, Moose and Catalyst and demonstrates some of the more complex possibilities.