LCFG Client Refactoring: starting the daemon

May 21, 2013

Following on from my previous work on fixing the way in which the UDP socket is opened for receiving notification messages I have been looking at why the LCFG component just hangs when the rdxprof process fails to daemonise.

It turns out that the LCFG client component uses an obscure ngeneric feature of the Start function which is that the final step is to call a StartWait function if it has been defined. In the client component this StartWait function sits waiting forever for a client context change even when the rdxprof process failed to start…

I think the problem comes from an expectation that the call to the Daemon function, which starts and backgrounds the rdxprof process, will fail if rdxprof fails to run. It does not fail ($? is zero) and the PID of the rdxprof process is always accessible through the $! variable, even if it was very short-lived.

There is, thankfully, a very simple solution here. The client component already has a IsProcessRunning function which can be used to check if the process associated with a PID is still active. This has to be used carefully, I have put a short sleep after the daemonisation stage to ensure that the process is fully started before doing the check. The check is also fairly naive so there is the slight risk that if the system is under resource pressure the rdxprof process could fail and then the PID could be immediately reused. For now I think it’s reasonable to just accept the risks attached and revisit the issue later if it causes us problems. Associated with this, clearly the StartWait function really ought to eventually give up.

LCFG Client Refactor: notification handling

May 21, 2013

One long-standing issue with running the LCFG client (rdxprof) in daemon mode has been that if another process has already acquired the UDP socket which it wants (port 732) then it does not fail at startup but just hangs. This is clearly rather undesirable behaviour as it leaves the machine in an unmanageable state but because the client process appears to be running it’s difficult to notice that anything is actually wrong.

Yesterday I spent a while looking at this problem. I reduced it to the most simple case of a script with a while-loop listening for messages on a UDP socket and then printing the messages to the screen. Running multiple processes at the same time revealed that there was nothing preventing multiple binds on the socket. I eventually discovered that this is caused by the SO_REUSEADDR option being set on the socket.

When using TCP setting this option is often necessary. It allows a process to reacquire access to a socket when it restarts. Sometimes processes may restart too quickly and the socket would otherwise not be ready. For UDP this option is only necessary if you want to listen on a broadcast or multicast address and have multiple listeners on the same machine, that’s a fairly unusual scenario.

Disabling the SO_REUSEADDR option does exactly what we want. Attempting to run two rdxprof processes now results in it exiting with status 1 and this message:

[FAIL] client: can't bind UDP socket
[FAIL] client: Address already in use

There is a further problem with the LCFG client component not returning control to the caller when it fails to start rdxprof and I will have to do some further investigations into that problem.

LCFG Client Refactor: splitting the project

May 14, 2013

I’ve recently been working on splitting the LCFG client code base from the LCFG component which is used to configure and manage the daemon. This allows the client Perl modules to be built in the style of a standard Perl module. The immediate benefit of this is the enhanced portability, it makes it much easier to build the code on platforms other than DICE Linux if you can use standard Perl module building tools. We could also upload the code to CPAN which would make it even easier to download and install.

There are also benefits for maintainability, the standard Perl build tools make it easy to run unit tests and do other tasks such as checking test and API documentation coverage for your code. It is not impossible to do these things without some tool like Module::Build but it is a lot more awkward. Also, without the standard tools you have to know, or be able to discover, where certain files should be installed, we have some of this built into the LCFG build tools CMake framework but it only handles fairly simple scenarios.

The new project which contains all the Perl modules for the client is named LCFG-Client-Perl in subversion and the component continues to be named lcfg-client in the standard LCFG component naming style. This completes stage 9 of the project plan.

LCFG Client Refactor: Comparing files

May 14, 2013

One thing that we need to do very frequently in the LCFG client, and also in many LCFG components, is comparing files. Typically we want to see if the new file we have just generated is any different from the previous version, in which case we will need to replace old with new and possibly carry out some action afterwards.

There are clearly many ways to solve this problem. We could read in the two files and do a simple string comparison (conceptually simple but tends to be messy, particularly if you want to minimise the memory requirements). It is also possible to calculate checksums for a file (MD5, SHA1, etc). I quite like this approach and it is nice and fast for small files. Up until now I’ve been using a mix of methods based on Text::Diff (wastes time since I don’t actually what the differences are) or calculating check sums, neither of which is an ideal approach in most cases.

What I really want though is a standard API which can simply answer the question of “are these two files the same?”. Some of the older LCFG code shows its shell heritage by using the cmp command. This command does exactly what we want and does it in a fairly efficient manner. The downside is that we have to execute another process every time we want to compare two files.

Step forwards, File::Compare. I’m not sure why I hadn’t spotted this module in the past. It works in a very similar way to the good old cmp command. It is also part of the set of core Perl modules which means it is available everywhere and it has a nice simple interface. I think I shall be converting various modules over to this approach in the future.

LCFG Client Refactor: New om interface

May 13, 2013

The LCFG client uses the om command line tool to call the configure method for an LCFG component when the resources change. Up until now this has been done using backticks which is not the best approach, particularly given that this involves building a command string and launching a full shell. I’ve now added a new Perl module to help with running om commands from perl. It’s in version 0.7.1 of lcfg-om, you can use it like this:

use LCFG::Om::Command;

my ( $status, $stdout, $stderr ) =
    LCFG::Om::Command::Run( "updaterpms", "run", "-Dv", "-t" );

The parameters are: component, method, ngeneric args, component args. You only need to specify the component and method names, the other two are optional. The argument options can either be simple strings or references to lists.

The status will be true/false to show the success of the command. You also get any output to stdout and stderr separately.

If you’re concerned that some method might not complete in a reasonable amount of time you can specify a timeout for the command:

my ( $status, $stdout, $stderr ) =
    LCFG::Om::Command::Run( "openafs", "restart", "-Dv", "", $timeout );

If the timeout is reached then the Run command dies, you need to use eval or a module like Try::Tiny to catch that exception.

Nicely this will also close file-descriptors 11 and 12 which are used internally by the LCFG ngeneric framework for logging. This will avoid daemons becoming asociated with those files when they are restarted (and consequently
tieing up the rdxprof process).

This is one of those nice situations where fixing a problem for one project has additional benefits for others. The trick here was in realising that the code should be added to the lcfg-om project rather than it just being in the LCFG client code base.

LCFG Client Refactor: File and Directory paths

May 9, 2013

The way in which the LCFG client handles paths to files and directories has never been pleasant. The old code contains a lot of hardwired paths inserted at package build-time by CMake using autoconf-style macros (e.g. @FOO@). This makes the code very inflexible, in particular, there is no way for a user to run the rdxprof script as a non-root user unless they are given write access to all directories and files in the /var/lcfg/conf/profile tree. There is no good reason to prevent running of rdxprof as a normal user, if they are authorized to access the XML profile then they should be allowed to run the script which parses the file and generates the DBM and RPM configuration files. They may not be able to run in full daemon mode and control the various components but one-shot mode certainly should be functional.

There are a couple of other things added into the mix which complicate matters further. Especially, there is some support for altering the root prefix for the file-system. This is used during install time where we are running from a file-system based in / as normal but installing to a file-system based in /root. I say some support since it seems that only certain essential code paths were modified.

I needed to come up with a universal solution for these two problems which could provide a fairly straightforward interface for locating files and directories. It had to neatly encapsulate the handling of any root prefix and allow non-root users to be able to store files. To this end I’ve introduced a new module, named LCFG::Client::FileLocator, which provides a class from which a locator object can be instantiated. There are instance attributes for the root prefix and the configuration data directory path (confdir) which can be set using rdxprof command line options. This object can be used to look up the correct path for any file which the LCFG client requires. There are basic methods for finding various standard LCFG paths and also useful higher-level methods for finding files for specific hosts or particular components. It’s got comprehensive documentation too so hopefully it will be a lot easier to understand in 10 years time than the previous code.

I’ve now completed stage 8 but will have to go back and finish stage 7 "Improve option handling", I would still like to try to add in configuration file handling. It’s a lot easier now that I’ve worked out the best way to deal with the various file paths. Having a single option for altering the configuration data directory was particularly useful.

So far I reckon I’ve spent just under 13 days of effort on the project. The allocation up to this point was 11 days (I have done the bulk of stage 7 though which takes it up to 12 days allocated). So, it’s still drifting away from the target a bit but not substantially.

LCFG Client Refactor: The joy of tests

May 9, 2013

I’m currently working on stage 8 of my project plan – "Eradicate hard wired paths". I’ll blog about the gory details later but for now I just wanted to show how the small number of tests I already have in place have proved to be very useful. As part of this work I have introduced a new module – LCFG::Client::FileLocator – which is nearly all new code. Having created this module I started porting over the rest of the client code to using it for file and directory path lookups. As I already had some tests I was able to gauge my progress by regularly running the test suite. As well as showing up the chunks of old client code which still needed to be ported it revealed bugs in 8 separate lines of code in the new FileLocator code. Finding these bugs didn’t require me to write a whole new set of tests for the new code (although that is on my todo list to ensure better coverage). For me that really shows the true value of writing some tests at the beginning of a refactoring process. It definitely produced higher quality code and the porting took much less time than it would have otherwise done.

LCFG Client Refactor: Logging

May 8, 2013

The next stage of untangling the LCFG client code was to improve the logging system. Up till now it has just been done using a set of subroutines which are all declared in the LCFG::Client module. Using the logging code in any other module then requires the loading of that module, this accounts for the bulk of all the inter-dependencies between the main LCFG::Client module and all the others. With a single purpose the logging code is an obvious target for separation into a distinct sub-system.

With the logging code I felt that the best approach was to convert it into an object-oriented style. The typical way that logging is done in various Perl logging modules (e.g. something like Log::Log4perl) is to have a singleton logging object which can be accessed anywhere in the code base. The advantage of this is that it is not necessary to pass around the logging object to every subroutine where it might be needed but we can still avoid creating a new object every time it is required. If the code base was fully object-oriented we might be better served having it as an instance attribute (this is what MooseX::Log::Log4perl provides) but we don’t have that option here. The logging object can be configured once and then used wherever necessary. For simplicity of porting, for now, I have made it a global variable in each Perl module, that’s not ideal but it’s a pragmatic decision to help with the speed of porting from the old procedural approach.

The new LCFG::Client::Log module does not have a new method. To make it clear that we are not creating a new object every time it instead has a GetLogger method. If no object has previously been instantiated then one is created, otherwise the previous object is returned. Again this can be done easily using the new state feature in Perl 5.10, like this:

sub GetLogger {
    my ($class) = @_;

    use feature 'state';

    state $self = bless {
        daemon_mode => 0,
        verbose     => 0,
        abort       => 0,
        debug_flags => {%debug_defaults},
        warn_flags  => {%warn_defaults},
    }, $class;

    return $self;

This new OO-style API neatly encapsulates all the logging behaviour we require. Previously a few variables in the LCFG::Client module had to be made universally accessible so that they could be queried. The new module provides accessor methods instead to completely hide the internals. This all helps to make it possible to simply extend or switch to a more standard framework at some point in the future if we so desire.

LCFG Client Refactor: context handling

May 3, 2013

Now that the basic tidying is complete the code is in a much better condition for safely making larger scale changes. One of the particular issues that need to be tackled is the coupling between modules. The current code is a tangled web with modules calling subroutines from each other. This makes the code harder to understand and much more fragile than we would like. There has been some attempt to separate functionality (e.g. Build, Daemon, Fetch) but it hasn’t entirely worked. For instance, in some cases subroutines are declared in one module but only used in another. The two main areas I wanted to concentrate on improving are context handling and logging.

Various client Perl modules have an interest in handling LCFG contexts and there is also a standalone script (setctx) used for setting context values. (For a good description of what contexts are and how they can be used see section 5.2.5 of the LCFG Guide). The context code was spread across rdxprof and LCFG::Client but in a previous round of tidying it was all merged into LCFG::Client. Ideally it should be kept in a separate module, this allows the creation of a standard API which improves code isolation by hiding implementation details. This in turn provides much greater flexibility for the code maintainer who can more easily make changes when desired.

The easiest part of this work was the shuffling of the context code into a new module (named LCFG::Client::Contexts). Once this was done I then took the chance to split down the code into lots of smaller chunks and remove duplication of functionality wherever possible (the code has gone from 4 big subroutines to 24 much smaller ones). This has resulted in a rather big increase in the amount of code (884 insertions versus 514 deletions) which is normally seen as a bad thing when refactoring but I felt in this case it was genuinely justified. Each chunk is now easier to understand, test and document – we now have a low-level API as well as the previous high-level functions. Also most of the subroutines are now short enough to view in a single screen of your favourite editor, that hugely helps the maintainer.

An immediate benefit of this refactoring work was seen when I came to look at the setctx script. There had been a substantial amount of duplication of code between this and the rdxprof script. As the context code was previously embedded in another script it was effectively totally inaccessible – the mere act of moving it into a separate module made it reusable. Breaking down the high-level subroutines into smaller chunks also made it much easier to call the code from setctx and remove further duplication. Overall setctx has dropped from 213 lines of code to 140 (including whitespace in both cases). Functionality which is implemented in scripts is very hard to unit test compared with that stored in separate modules. So it’s now much easier to work with the context code and know that setctx won’t suddenly break.

LCFG Client Refactor: Initial tidying done

April 25, 2013

A quick dash through the code in the LCFG::Client::Fetch module (which is relatively small and fairly straightforward) means that all the LCFG client code has been checked using perlcritic and improved where necessary/possible. This completes the work for stages 4 and 5 of the project plan.

Some of the work involved in this stage has been rather more complex than anticipated. Mostly that was not related directly to resolving issues highlighted by perlcritic. In the main it was because whilst investigating the issues raised I spotted other, big problems with sections of code that I felt needed to be resolved. Those could have been kept separate and done as an additional stage in the project plan but I thought it was better to just do them. In particular, I have made large improvements to the sending and receiving of UDP messages for notifications and acknowledgements. I’ve also improved the logic involved with handling the “secure mode” and split out lots of sections of code into smaller, more easily testable, chunks.

This takes the effort expended up to about 7 days. That’s about 1 day over what I had expected but that was accounted for in stages 1 and 2, the gap between predicted and actual effort requirements has not worsened.