As part of my work on updating the LCFG client I’ve written a guide to the inner workings of the LCFG client. This is intended to be fairly high-level so it doesn’t go into the details of which subroutine calls which subroutine. The aim is that this should cover all the main functionality and provide the information necessary to get started with altering and extending the client code base.
Recently I’ve been working on developing a new framework which encapsulates all aspects of handling the LCFG profiles on the client-side. This framework is written in Perl and is named, appropriately enough,
LCFG::Profile, I plan to blog about the various details in due course. The coding phase is almost complete and I’ve moved onto adding documentation for all the module APIs. I’ve found the documentation phase to be a very useful review process. It has helped me spot various dark corners of the code and methods which were added earlier in the development process which are no longer required. Removing this dead code now is a good idea as we may otherwise end up being committed to supporting the code if it forms part of a public API. I’ve also found it to be a very good way to spot inconsistencies between similar APIs implemented in the various modules. It’s definitely a good idea to follow the principle of least surprise whenever possible. If methods are named similarly and take a similar group of arguments they probably ought to return similar types of results.
As the results of Phase One of the LCFG Client refactoring project are now in the beta-testing stage and approaching a roll-out date we have commenced work on Phase Two. The primary aim of this new work is to remove all dependencies on the
W3C::SAX Perl modules which have been unmaintained for a very long time. We’re probably the last place in the world still using those modules so it’s definitely time to be moving on to something more modern. The project plan for this new work is available for anyone interested.
As a first step I’ve been prototyping some new XML parsing code based on the popular and well-maintained
XML::LibXML module. I’ve also been thinking about ideas for an API for storing/accessing the information regarding components and resources. I’ve put together some useful notes on the LCFG XML profile structure to help me get my head around it all.
I am pleased to announce that the v3 update for the LCFG client has now reached the beta-release stage. As of stable release 2013101401a everything is in place to begin testing at your ownsite. Full details are available on the LCFG wiki.
If you come across any bugs or unexpected behaviour please file a bug at bugs.lcfg.org.
I remember once as a 12 year old playing rugby at school. I received the ball, saw the field ahead was clear and knew that this was the time to run like hell. For one joyous moment I was brushing aside the defending team, spotting my moment of glory, having never been a particularly sporty kid was this my chance to join the cool crowd? Sadly, someone burst my bubble and pointed out that the main reason I wasn’t being flattened was because we were actually playing touch rugby…
Anyway, my general point is, it’s always good to know when, having been passed the ball, you should just run like hell and see what happens. It might also be good to remember which game you are playing but, hey, ho…
Having been given the chance to split the LCFG node name from the host name, I spotted a chance to really make it count. In short order the following code has been altered to extend this support to the whole of the LCFG client framework:
- perl-LCFG-Utils 1.5.0
- lcfg-ngeneric 1.4.0
- lcfg-om 0.8.0
- lcfg-file 1.2.0
- lcfg-authorize 1.1.0
- lcfg-hackparts 0.103.0
- lcfg-logserver 1.4.0
- lcfg-sysinfo 1.3.0
- lcfg-installroot 0.103.0
None of this has (yet) been shipped to the stable tree since it needs more hacking of the current LCFG client (v2) code to fix a compatibility issue.
The big achievement here is that it makes it possible to specify the lcfg nodename on the PXE installer kernel command-line via the
lcfg.node parameter and get the whole way through to an installed managed machine which is using a LCFG profile which is completely unrelated to the host name.
There are various big benefits to this change. It is now possible to have a fully roaming machine which is LCFG managed, there is no requirement for a static host name or static IP address. This means that no matter what host name or domain name settings are in place the LCFG client will continue to work as required. This also makes it possible to use a single “generic” profile to configure multiple machines. If you know you have a lab full of identical machines this could be very handy indeed.
The downside of this is that some things like spanning maps will not work the way you might expect. You also will not receive notifications from the server when a profile changes, you have to really solely on the poll time (probably worth making the timeout shorter). You probably also cannot send acknowledgements to the server and the LCFG status pages will consequently be mostly useless for those clients. It is also difficult to configure networking to do anything other than use DHCP. You’re choosing to move some of the configuration information back out of LCFG (or at least out of a particular profile). You may end up saving effort one way and adding it in another.
At the moment although I have broken the conceptual link between node and host name for the client framework there are still lots of components which are confused by this change. Components have traditionally been able to rely on combining the
profile.domain resources to form the FQDN. This was probably always on slightly shaky ground but now there can be no guarantee whatsoever of a useful value in the
profile.node resource. If a component really cares about the host name (rather than the node name) then it will have to ask the host directly (using
Sys::Hostname from Perl).
A long-standing issue that we have had with the LCFG client is that it is not possible to use an LCFG profile with a name which does not match the host name. They have always been treated by rdxprof and the ngeneric framework as conceptually interchangeable. There is no particular reason for this limitation other than the traditional “it’s always been that way“, also we’ve never had a requirement important enough to get this implemented or the opportunity to quickly make the change. As the refactoring project is drawing to a close it seemed like a good time to break this conceptual connection and rework the code to always use the LCFG node name. For the moment the actual behaviour won’t change, since the node name defaults to the host name as before, but we now have a mechanism to allow it to be altered. When the client enters daemon mode it now stashes the name of the LCFG node being used. Since you can only run one client daemon at a time this makes reasonable sense. The standalone one-shot behaviour remains unaltered, you can still access any profile you like.
Following on from my previous work on fixing the way in which the UDP socket is opened for receiving notification messages I have been looking at why the LCFG component just hangs when the rdxprof process fails to daemonise.
It turns out that the LCFG client component uses an obscure ngeneric feature of the
Start function which is that the final step is to call a
StartWait function if it has been defined. In the client component this
StartWait function sits waiting forever for a client context change even when the rdxprof process failed to start…
I think the problem comes from an expectation that the call to the
Daemon function, which starts and backgrounds the rdxprof process, will fail if rdxprof fails to run. It does not fail (
$? is zero) and the PID of the rdxprof process is always accessible through the
$! variable, even if it was very short-lived.
There is, thankfully, a very simple solution here. The client component already has a
IsProcessRunning function which can be used to check if the process associated with a PID is still active. This has to be used carefully, I have put a short sleep after the daemonisation stage to ensure that the process is fully started before doing the check. The check is also fairly naive so there is the slight risk that if the system is under resource pressure the rdxprof process could fail and then the PID could be immediately reused. For now I think it’s reasonable to just accept the risks attached and revisit the issue later if it causes us problems. Associated with this, clearly the
StartWait function really ought to eventually give up.
One long-standing issue with running the LCFG client (
rdxprof) in daemon mode has been that if another process has already acquired the UDP socket which it wants (port 732) then it does not fail at startup but just hangs. This is clearly rather undesirable behaviour as it leaves the machine in an unmanageable state but because the client process appears to be running it’s difficult to notice that anything is actually wrong.
Yesterday I spent a while looking at this problem. I reduced it to the most simple case of a script with a while-loop listening for messages on a UDP socket and then printing the messages to the screen. Running multiple processes at the same time revealed that there was nothing preventing multiple binds on the socket. I eventually discovered that this is caused by the
SO_REUSEADDR option being set on the socket.
When using TCP setting this option is often necessary. It allows a process to reacquire access to a socket when it restarts. Sometimes processes may restart too quickly and the socket would otherwise not be ready. For UDP this option is only necessary if you want to listen on a broadcast or multicast address and have multiple listeners on the same machine, that’s a fairly unusual scenario.
SO_REUSEADDR option does exactly what we want. Attempting to run two rdxprof processes now results in it exiting with status 1 and this message:
[FAIL] client: can't bind UDP socket [FAIL] client: Address already in use
There is a further problem with the LCFG client component not returning control to the caller when it fails to start rdxprof and I will have to do some further investigations into that problem.
I’ve recently been working on splitting the LCFG client code base from the LCFG component which is used to configure and manage the daemon. This allows the client Perl modules to be built in the style of a standard Perl module. The immediate benefit of this is the enhanced portability, it makes it much easier to build the code on platforms other than DICE Linux if you can use standard Perl module building tools. We could also upload the code to CPAN which would make it even easier to download and install.
There are also benefits for maintainability, the standard Perl build tools make it easy to run unit tests and do other tasks such as checking test and API documentation coverage for your code. It is not impossible to do these things without some tool like
Module::Build but it is a lot more awkward. Also, without the standard tools you have to know, or be able to discover, where certain files should be installed, we have some of this built into the LCFG build tools CMake framework but it only handles fairly simple scenarios.
The new project which contains all the Perl modules for the client is named
LCFG-Client-Perl in subversion and the component continues to be named
lcfg-client in the standard LCFG component naming style. This completes stage 9 of the project plan.
One thing that we need to do very frequently in the LCFG client, and also in many LCFG components, is comparing files. Typically we want to see if the new file we have just generated is any different from the previous version, in which case we will need to replace old with new and possibly carry out some action afterwards.
There are clearly many ways to solve this problem. We could read in the two files and do a simple string comparison (conceptually simple but tends to be messy, particularly if you want to minimise the memory requirements). It is also possible to calculate checksums for a file (MD5, SHA1, etc). I quite like this approach and it is nice and fast for small files. Up until now I’ve been using a mix of methods based on
Text::Diff (wastes time since I don’t actually what the differences are) or calculating check sums, neither of which is an ideal approach in most cases.
What I really want though is a standard API which can simply answer the question of “are these two files the same?”. Some of the older LCFG code shows its shell heritage by using the
cmp command. This command does exactly what we want and does it in a fairly efficient manner. The downside is that we have to execute another process every time we want to compare two files.
File::Compare. I’m not sure why I hadn’t spotted this module in the past. It works in a very similar way to the good old
cmp command. It is also part of the set of core Perl modules which means it is available everywhere and it has a nice simple interface. I think I shall be converting various modules over to this approach in the future.