LCFG Client Refactor: host name versus node name

May 23, 2013

A long-standing issue that we have had with the LCFG client is that it is not possible to use an LCFG profile with a name which does not match the host name. They have always been treated by rdxprof and the ngeneric framework as conceptually interchangeable. There is no particular reason for this limitation other than the traditional “it’s always been that way“, also we’ve never had a requirement important enough to get this implemented or the opportunity to quickly make the change. As the refactoring project is drawing to a close it seemed like a good time to break this conceptual connection and rework the code to always use the LCFG node name. For the moment the actual behaviour won’t change, since the node name defaults to the host name as before, but we now have a mechanism to allow it to be altered. When the client enters daemon mode it now stashes the name of the LCFG node being used. Since you can only run one client daemon at a time this makes reasonable sense. The standalone one-shot behaviour remains unaltered, you can still access any profile you like.


python and string encodings

May 21, 2013

I’ve recently finished the User accessible login reports project. After the initial roll-out to users I had a few reports of people getting server errors when certain sets of data were viewed. This website is written in Python and uses the Django framework. During the template processing stage we were getting error messages like the following:

DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 30: invalid continuation byte.

It appears that not all data coming from the whois service is encoded in the same way (see RFC 3912 for a discussion of the issue). In this case it was using a latin1 encoding but whois is quite an old service which has no support for declaring the content encoding used so we can never know what we are going to have to handle in advance.

A bit of searching around revealed the chardet module which can be used to automatically detect the encoding used in a string. So, I just added the following code and the problem was solved.

import chardet
enc = chardet.detect(val)['encoding']
if enc != 'utf-8':
    val = val.decode(enc)
val = val.encode('ascii','replace')

The final result is that I am guaranteed to have the string from whois as an ascii string with any unsupported characters replaced by a question mark (?). It’s not a perfect representation but it is web safe and is good enough for my needs.


LCFG Client Refactoring: starting the daemon

May 21, 2013

Following on from my previous work on fixing the way in which the UDP socket is opened for receiving notification messages I have been looking at why the LCFG component just hangs when the rdxprof process fails to daemonise.

It turns out that the LCFG client component uses an obscure ngeneric feature of the Start function which is that the final step is to call a StartWait function if it has been defined. In the client component this StartWait function sits waiting forever for a client context change even when the rdxprof process failed to start…

I think the problem comes from an expectation that the call to the Daemon function, which starts and backgrounds the rdxprof process, will fail if rdxprof fails to run. It does not fail ($? is zero) and the PID of the rdxprof process is always accessible through the $! variable, even if it was very short-lived.

There is, thankfully, a very simple solution here. The client component already has a IsProcessRunning function which can be used to check if the process associated with a PID is still active. This has to be used carefully, I have put a short sleep after the daemonisation stage to ensure that the process is fully started before doing the check. The check is also fairly naive so there is the slight risk that if the system is under resource pressure the rdxprof process could fail and then the PID could be immediately reused. For now I think it’s reasonable to just accept the risks attached and revisit the issue later if it causes us problems. Associated with this, clearly the StartWait function really ought to eventually give up.


LCFG Client Refactor: notification handling

May 21, 2013

One long-standing issue with running the LCFG client (rdxprof) in daemon mode has been that if another process has already acquired the UDP socket which it wants (port 732) then it does not fail at startup but just hangs. This is clearly rather undesirable behaviour as it leaves the machine in an unmanageable state but because the client process appears to be running it’s difficult to notice that anything is actually wrong.

Yesterday I spent a while looking at this problem. I reduced it to the most simple case of a script with a while-loop listening for messages on a UDP socket and then printing the messages to the screen. Running multiple processes at the same time revealed that there was nothing preventing multiple binds on the socket. I eventually discovered that this is caused by the SO_REUSEADDR option being set on the socket.

When using TCP setting this option is often necessary. It allows a process to reacquire access to a socket when it restarts. Sometimes processes may restart too quickly and the socket would otherwise not be ready. For UDP this option is only necessary if you want to listen on a broadcast or multicast address and have multiple listeners on the same machine, that’s a fairly unusual scenario.

Disabling the SO_REUSEADDR option does exactly what we want. Attempting to run two rdxprof processes now results in it exiting with status 1 and this message:

[FAIL] client: can't bind UDP socket
[FAIL] client: Address already in use

There is a further problem with the LCFG client component not returning control to the caller when it fails to start rdxprof and I will have to do some further investigations into that problem.


LCFG Client Refactor: splitting the project

May 14, 2013

I’ve recently been working on splitting the LCFG client code base from the LCFG component which is used to configure and manage the daemon. This allows the client Perl modules to be built in the style of a standard Perl module. The immediate benefit of this is the enhanced portability, it makes it much easier to build the code on platforms other than DICE Linux if you can use standard Perl module building tools. We could also upload the code to CPAN which would make it even easier to download and install.

There are also benefits for maintainability, the standard Perl build tools make it easy to run unit tests and do other tasks such as checking test and API documentation coverage for your code. It is not impossible to do these things without some tool like Module::Build but it is a lot more awkward. Also, without the standard tools you have to know, or be able to discover, where certain files should be installed, we have some of this built into the LCFG build tools CMake framework but it only handles fairly simple scenarios.

The new project which contains all the Perl modules for the client is named LCFG-Client-Perl in subversion and the component continues to be named lcfg-client in the standard LCFG component naming style. This completes stage 9 of the project plan.


LCFG Client Refactor: Comparing files

May 14, 2013

One thing that we need to do very frequently in the LCFG client, and also in many LCFG components, is comparing files. Typically we want to see if the new file we have just generated is any different from the previous version, in which case we will need to replace old with new and possibly carry out some action afterwards.

There are clearly many ways to solve this problem. We could read in the two files and do a simple string comparison (conceptually simple but tends to be messy, particularly if you want to minimise the memory requirements). It is also possible to calculate checksums for a file (MD5, SHA1, etc). I quite like this approach and it is nice and fast for small files. Up until now I’ve been using a mix of methods based on Text::Diff (wastes time since I don’t actually what the differences are) or calculating check sums, neither of which is an ideal approach in most cases.

What I really want though is a standard API which can simply answer the question of “are these two files the same?”. Some of the older LCFG code shows its shell heritage by using the cmp command. This command does exactly what we want and does it in a fairly efficient manner. The downside is that we have to execute another process every time we want to compare two files.

Step forwards, File::Compare. I’m not sure why I hadn’t spotted this module in the past. It works in a very similar way to the good old cmp command. It is also part of the set of core Perl modules which means it is available everywhere and it has a nice simple interface. I think I shall be converting various modules over to this approach in the future.


LCFG Client Refactor: New om interface

May 13, 2013

The LCFG client uses the om command line tool to call the configure method for an LCFG component when the resources change. Up until now this has been done using backticks which is not the best approach, particularly given that this involves building a command string and launching a full shell. I’ve now added a new Perl module to help with running om commands from perl. It’s in version 0.7.1 of lcfg-om, you can use it like this:

use LCFG::Om::Command;

my ( $status, $stdout, $stderr ) =
    LCFG::Om::Command::Run( "updaterpms", "run", "-Dv", "-t" );

The parameters are: component, method, ngeneric args, component args. You only need to specify the component and method names, the other two are optional. The argument options can either be simple strings or references to lists.

The status will be true/false to show the success of the command. You also get any output to stdout and stderr separately.

If you’re concerned that some method might not complete in a reasonable amount of time you can specify a timeout for the command:

my ( $status, $stdout, $stderr ) =
    LCFG::Om::Command::Run( "openafs", "restart", "-Dv", "", $timeout );

If the timeout is reached then the Run command dies, you need to use eval or a module like Try::Tiny to catch that exception.

Nicely this will also close file-descriptors 11 and 12 which are used internally by the LCFG ngeneric framework for logging. This will avoid daemons becoming asociated with those files when they are restarted (and consequently
tieing up the rdxprof process).

This is one of those nice situations where fixing a problem for one project has additional benefits for others. The trick here was in realising that the code should be added to the lcfg-om project rather than it just being in the LCFG client code base.


LCFG Client Refactor: File and Directory paths

May 9, 2013

The way in which the LCFG client handles paths to files and directories has never been pleasant. The old code contains a lot of hardwired paths inserted at package build-time by CMake using autoconf-style macros (e.g. @FOO@). This makes the code very inflexible, in particular, there is no way for a user to run the rdxprof script as a non-root user unless they are given write access to all directories and files in the /var/lcfg/conf/profile tree. There is no good reason to prevent running of rdxprof as a normal user, if they are authorized to access the XML profile then they should be allowed to run the script which parses the file and generates the DBM and RPM configuration files. They may not be able to run in full daemon mode and control the various components but one-shot mode certainly should be functional.

There are a couple of other things added into the mix which complicate matters further. Especially, there is some support for altering the root prefix for the file-system. This is used during install time where we are running from a file-system based in / as normal but installing to a file-system based in /root. I say some support since it seems that only certain essential code paths were modified.

I needed to come up with a universal solution for these two problems which could provide a fairly straightforward interface for locating files and directories. It had to neatly encapsulate the handling of any root prefix and allow non-root users to be able to store files. To this end I’ve introduced a new module, named LCFG::Client::FileLocator, which provides a class from which a locator object can be instantiated. There are instance attributes for the root prefix and the configuration data directory path (confdir) which can be set using rdxprof command line options. This object can be used to look up the correct path for any file which the LCFG client requires. There are basic methods for finding various standard LCFG paths and also useful higher-level methods for finding files for specific hosts or particular components. It’s got comprehensive documentation too so hopefully it will be a lot easier to understand in 10 years time than the previous code.

I’ve now completed stage 8 but will have to go back and finish stage 7 "Improve option handling", I would still like to try to add in configuration file handling. It’s a lot easier now that I’ve worked out the best way to deal with the various file paths. Having a single option for altering the configuration data directory was particularly useful.

So far I reckon I’ve spent just under 13 days of effort on the project. The allocation up to this point was 11 days (I have done the bulk of stage 7 though which takes it up to 12 days allocated). So, it’s still drifting away from the target a bit but not substantially.


LCFG Client Refactor: The joy of tests

May 9, 2013

I’m currently working on stage 8 of my project plan – "Eradicate hard wired paths". I’ll blog about the gory details later but for now I just wanted to show how the small number of tests I already have in place have proved to be very useful. As part of this work I have introduced a new module – LCFG::Client::FileLocator – which is nearly all new code. Having created this module I started porting over the rest of the client code to using it for file and directory path lookups. As I already had some tests I was able to gauge my progress by regularly running the test suite. As well as showing up the chunks of old client code which still needed to be ported it revealed bugs in 8 separate lines of code in the new FileLocator code. Finding these bugs didn’t require me to write a whole new set of tests for the new code (although that is on my todo list to ensure better coverage). For me that really shows the true value of writing some tests at the beginning of a refactoring process. It definitely produced higher quality code and the porting took much less time than it would have otherwise done.


LCFG Client Refactor: Logging

May 8, 2013

The next stage of untangling the LCFG client code was to improve the logging system. Up till now it has just been done using a set of subroutines which are all declared in the LCFG::Client module. Using the logging code in any other module then requires the loading of that module, this accounts for the bulk of all the inter-dependencies between the main LCFG::Client module and all the others. With a single purpose the logging code is an obvious target for separation into a distinct sub-system.

With the logging code I felt that the best approach was to convert it into an object-oriented style. The typical way that logging is done in various Perl logging modules (e.g. something like Log::Log4perl) is to have a singleton logging object which can be accessed anywhere in the code base. The advantage of this is that it is not necessary to pass around the logging object to every subroutine where it might be needed but we can still avoid creating a new object every time it is required. If the code base was fully object-oriented we might be better served having it as an instance attribute (this is what MooseX::Log::Log4perl provides) but we don’t have that option here. The logging object can be configured once and then used wherever necessary. For simplicity of porting, for now, I have made it a global variable in each Perl module, that’s not ideal but it’s a pragmatic decision to help with the speed of porting from the old procedural approach.

The new LCFG::Client::Log module does not have a new method. To make it clear that we are not creating a new object every time it instead has a GetLogger method. If no object has previously been instantiated then one is created, otherwise the previous object is returned. Again this can be done easily using the new state feature in Perl 5.10, like this:


sub GetLogger {
    my ($class) = @_;

    use feature 'state';

    state $self = bless {
        daemon_mode => 0,
        verbose     => 0,
        abort       => 0,
        debug_flags => {%debug_defaults},
        warn_flags  => {%warn_defaults},
    }, $class;

    return $self;
}

This new OO-style API neatly encapsulates all the logging behaviour we require. Previously a few variables in the LCFG::Client module had to be made universally accessible so that they could be queried. The new module provides accessor methods instead to completely hide the internals. This all helps to make it possible to simply extend or switch to a more standard framework at some point in the future if we so desire.