Simon's Musings

April 4, 2008

Thoughts from the train 2: References and Mutations

Filed under: Uncategorized — sxw @ 10:49 am
Tags: ,

Another thought from the train journey back from UKUUG. The real work here is Stephen’s, I’m just trying to jot down some background so we remember how we go there!

The LCFG compiler currently supports a number of data operations, which were independently developed, and which don’t necessarily nicely fit together. For the purposes of this discussion, these are:

Mutations Operations (like mADD, mREMOVE, mEXTRA, mSET) which take an existing value, and change it in a way that depends upon their parameters.

Early references A reference to the value of another resource. This is evaluated immediately it is encountered, and is set to the current value of the resource.

Late references Also a reference to the value of another resource. However, a late reference is evaluated once compilation is complete (and after all mutations have been computed), and is set to the final value of the resource

Stephen suggested that we handle mutations by holding a list of mutations, rather than the current value, within the parse tree. Then all of the mutations are applied in the final linking step (which is also responsible for reference evaluation). This allows us to optimise our mutation handling, as well as permitting the production of more specific derivation information.

In order to handle early references, we need to store an additional piece of information. When an early reference is encountered, we must store both the resource being referenced, and the current depth of that resource’s mutation list. This means we can mimic the ‘early’ behaviour and still leave reference processing to the linker.

April 3, 2008

Thoughts from the train: LCFG Timestamps

Filed under: Uncategorized — sxw @ 1:30 pm
Tags: ,

Whilst on the way back from the UKUUG conference (see the last post for details), Paul, Stephen, Gihan and I had a long talk about some of the structural issues we’ve encountered in the LCFG server. Some of those thoughts will unfortunately be lost in the mists of memory, however I thought it was worthing jotting down some notes from the very long chat we had about timestamps.

We have an LCFG architecture where a central source repository contains all of the data which the compiler uses to build profiles. Multiple (in our case 2, but theoretically an unlimited number of) machines pull data from the source repository and compile XML profiles from it, which they then serve to clients. Each XML profile must be accompanied by a unique timestamp, so that a client knows whether the XML profile it has just fetched is newer than the one it is currently using. An XML profile is created by compile multiple source files, starting from a single, per-XML-profile file which is also, confusingly, called a profile.

The problem is how this timestamp can be calculated in a robust fashion. The requirements for robustness are:

  • Any change in a profile’s source data should always result in an increase in the timestamp (the timestamp must be increasing)
  • The same XML profile must have an equivalent timestamp when served by multiple machines (it must be possible to compare XML profiles fetched from either server, and determine which is newer)
  • These guarantees must apply regardless of downtime on any of the LCFG servers, and allow for LCFG servers with radically different compilation speeds

In addition, we’ve historically imposed a number of constraints upon the solution

  • There can be no direct communication between the multiple LCFG servers.
  • The LCFG servers cannot ‘talk back’ to the source repository
  • There is no guarantee that all sources come from the same location (there may be multiple SCMs, for example)
  • The LCFG servers cannot maintain state. This is a constraint that flows from our downtime guarantee – if the servers have state, and one goes down, there’s no guarantee that it will have the same state as the other one when it comes back up.

The Current Solution

Currently, we have a solution based on the timestamps of all of the source files that contribute to a particular profile. When a profile is compiled its timestamp is set to that of the most recent source file that contributed to that profile. There are two problems with this system

  • Deleting an entry from a spanning map doesn’t result in a change of profile. If machine D publishes information into a spanning map that machine A subscribes to, and then machine D leaves the spanning map, there will be no change to the timestamp of A’s profile (in fact, it may go back in time). This is because the server does not maintain state. It never knows that D was in A’s profile, and so doesn’t know that the change in D’s configuration should affect A’s timestamp.
  • Timestamp correctness is critical. This interferes with SCM systems, and with using tools such as rsync to copy profile data around. Both SCMs, and rsync set the timestamp of a file to its timestamp on the originating system. If other source files already have timestamps newer than these, then the files you have just copied in will not result in a change to the timestamp of the generated profile, even if they have changed, and the client won’t notice the changes.

CSNs not timestamps

During our discussion it became obvious that thinking of these identifiers as timestamps was counter productive. We aren’t actually interested in the time that the profile was built (or edited or …) at all, we just need an increasing number by which we can order the profiles which we receive. This change sequence number (CSN) can be any object to which an ordering relation can be assigned.

Client Promiscuity

(I’m sure that heading will get me some odd search engine hits)

The requirement that the timestamp must be increasing is critical due to the current server selection algorithm used by the client. This means that which server a client uses will change with every request that client makes – so a situation where timestamps are not in lockstep between all of the servers will cause the client’s state to flap repeatedly as it switches servers.

It would be possible to partially solve this problem by making clients faithful, and having them only switch servers when one goes down. This means that the timestamp problem shifts from being a constant one to one which only occurs occasionally. However, it obviously removes all of the load balancing characteristics of the current system and would have to be carefully analysed before deployment.

This change also isn’t sufficient to permit removal of the ‘timestamp equivalence between servers’ robustness guarantee. When a machine switches from server A to server B it must be able to tell if the profile server B is offering it is newer, older, or the same as that on server A. However, there’s no guarantee that B has built all of the profiles that A built – it may have been running slower than B. So, we still need a way of assigning ordering on these occasions.

Include tree CSNs

We therefore started thinking about ways in which we could create CSNs using purely the data given to us from the source repository. We can’t use timestamps, as there’s no guaranteed correlation between the timestamp and a change. We decided that it was acceptable to require that every file within the repository have an increasing revision number associated with it, that that revision number must be increased every time the file is changed, and that that revision number be available to the LCFG compiler.

We then started thinking about mechanisms for composing these to produce CSNs. The issue here is that we have to be able to deal with deletion – a CSN which works by (for example) adding all of the revision numbers in the tree will fail in the face of a file being deleted – in this example the CSN would actually shoot backwards at that point.

Paul came up with the idea of modelling this as an inclusion relationship, where the revisions closer to the top of the tree are given more weight than those at the bottom. Given that removing a node from the tree requires modifying (and thus incrementing the revision of) the node above it, giving higher nodes more weight ensures that this deletion always results in a changed CSN. Hopefully an example will make this clearer.

Example tree The image on the left shows an inclusion tree for the profile for machine A. A includes the headers a and b which in turn include c, d and e. The numbers outside the circle are the revision number for each of these files.

If we say that weight of each level is 10 times that of the level below it, we could define a CSN for this file that looks like: 156 (the top level has a single node with a revision of 1, the second level has 2+3 =5, and the bottom level 1+2+3 =6). If we were to remove node b, then we would do so by changing A (so it now has a revision of 2). Our new CSN is then 226 – which is obviously larger, despite a section of the tree being removed.

This scheme unfortunately falls down when our summed revisions at each level become larger than the weighting factor we’re applying. However, Gihan asked why we need to treat this as a number at all. Instead, why not represent it as 1.5.6 (using the . as the level seperator). It’s trivial to define an ordering, and we have the ability to grow as large as we like at any given level.

This scheme pretty convincingly solves the first problem – we are no longer reliant on timestamps, and we have a way of producing a unique CSN from the source data. However …

Spanning Maps

The second part of our dilemma rears its ugly head.

In addition to incorporating data gleaned from files it includes, a profile may also contain data produced from spanning maps. Each of these spanning map contributions comes from a machine, and so may be versioned in the same way as that machine – for example if machine B has a CSN of 2.5.7, it’s spanning map contribution may be included as shown in the diagram below.

Inclusion tree with spanning mapsBut, note that we no longer have a revisioned entry for the node which contains these contributions. This is the crux of the problem.

Without a revision number on this node, we can’t deal with D disappearing. If we maintain the number locally, then we can’t keep our two servers in lock step.

The presence of spanning maps breaks what looks like an elegant CSN maintainance scheme. In fact, we came to the conclusion on the train that it is the very nature of the way that spanning maps work that means we can’t easily time stamp them. In the including relationship entries are pulled from the top down (that is, file a includes files c and d) In order to remove file d, we have to modify file a, and that modification will always contribute to our new CSN.

The spanning map case is different, in that it may well be an entry in k (for example) that results in D’s inclusion. If D is no longer included due to a change in k, then k is no longer included in the CSN computation, and so things break. It is the direction of this inclusion order, a fundmental part of the power of spanning maps, which makes composing any kind of CSN (be it from timestamps, or using this scheme) impossible.

Possible directions

The only way to work round this is if you have a scheme where the revision of every file from which a profile may potentially be built (including deleted files) is included in the CSN of that profile. One way of acheiving this is to make the source code repository create a unique CSN for every change in the repository. You get this for free if everything in the repository comes from a SCM system like SVN, but as soon as even one file comes from an external source, you need an external mechanism. This external mechanism must be applied at a common point in the process (that is, on the source server, rather than on the compilation machines).

UKUUG spring conference

Filed under: Uncategorized — sxw @ 11:04 am
Tags: , , , ,

I’ve just got back from the UKUUG Spring Conference, where a group of us from Informatics (myself, Stephen, Paul and Gihan from Flexiscale) were giving talks. I talked on two subjects – the LCFG based monitoring system framework I developed last year, and the new account management system I’m currently writing. Slides from both of these talks are available on the DICE publications page, which also has Stephen’s slides from his “An end to hacky scripts” talk about the LCFG system.

Despite gaining a scripting language track, and the addition of a parallel one-day PostgreSQL conference, the event seemed smaller this year, with many of the familiar faces missing. Some unfortunate scheduling meant that switching between tracks wasn’t as easy as it could have been, with 45 minute sessions in one room scheduled against 30 minute sessions in another one. However, the event was still productive, useful and stimulating, with a number of interesting talks – slides and audio from which should hopefully be up on the conference website shortly.

Some highlights were the talk from Mark Gledhill from the BBC on “Feeding the BBC Homepage“, which provided a fascinating insight into perl and Catalyst usage at a large organisation, as well as giving a useful background on their project management techniques, and test and deployment issues. Gavin Henry’s talk on OpenLDAP 2.4 provided a valuable summary of the changes in the latest version of OpenLDAP, as well as giving some examples of practical uses for these new features. Randy Appleton’s “Today’s Software … Is It Really Bloated?” talk took a very humorous tour through a number of code size and performance statistics he and his students have been collecting over the years – a perfect start to the day after the conference dinner!

The Transitive (which I ended up seeing because it was swapped with the talk I wanted to hear – one peril of last minute schedule changes!), and ZFS talks pretty much repeated material I’d heard at other conferences, but the ZFS one, in particular, was a helpful reminder of a system I’d really like to have time to look at in more detail. Whilst I wasn’t specifically interested in the scripting language talks, I did manage to catch “USENET Gems” which provided details of a number of interesting perl quirks, which are now firmly filed as things to watch out for.

Paul and Stephen arranged a well attended LCFG BOF on the Tuesday afternoon, and Paul, Stephen and I took some time to chat on Wednesday about possible designs for the new LCFG compiler. As with all UKUUG conferences, it tends to be these unscheduled events, and impromptu corridor conversations where the real value lies. There was a large amount of interest in prometheus, both from people in the commercial sector who have deployed similar systems, and had insights to share, and those who are interested in similar systems for their own sites. Hopefully we’ll be able to build some kind of a community around this technology.

There was continued interest in OpenAFS and Kerberos, with a number of people asking questions both about the technology, and our deployment experiences. Access to the source code for the monitoring system was also in demand – I really should arrange to publish this somewhere less adhoc.

March 27, 2008

Using packages from upstream

Filed under: Uncategorized — sxw @ 1:39 pm
Tags: , ,

One of the most time consuming things about building new software is working out whether a package has already been packaged ‘upstream’ in Fedora. Doing so is important, because if it’s available in EPEL we should be using that, and if it’s available in any version of Fedora it’s quite likely that that version can be ported to Scientific Linux with less effort than building a package from scratch. Besides, we can’t submit a package upstream if it’s already there šŸ™‚

Fedora has a couple of databases which store package information. The Fedora Package Database provides the definitive information on packages, along with which systems they have been built for. Unfortunately, the package database doesn’t currently seem to have a searchable interface, meaning that finding a package is a matter of appending likely names to the end of a URL. koji, the Fedora build farm, also provides packaging information – but only for those systems which koji builds for – it will not indicate whether packages are available within EPEL.

Fedora Package Database

To get information from this database, append the likely name of your package to For instance, if you’re interested in perl-HTML-Tree, then navigate to From this page, you can see which architectures the package should be available for. If ‘Fedora EPEL 5’ is listed, then run, don’t walk, to your nearest Extras repository.

If an EPEL package isn’t available, the clicking on the Build Status link will take you to the koji page for this package. Of which, more below.


Koji is the Fedora Project’s build system. It only currently only builds Fedora operating systems (EPEL is built using an older build system called plague). Koji provides a powerful search interface that lets you find packages, but it won’t tell you whether they exist in EPEL or not. koji can be found at

Once you’ve found the koji page for your package, clicking on a package in the ‘Builds’ section will let you download the SRPM which the build system spat out. From this SRPM, it is simply a matter of adding a ‘.1.inf’ to the Release field, and a suitable changelog comment, to produce an SL5 package suitable for our ‘world’ repository. I have a script which automates this step, and automatically builds and submits both i386 and x86_64 versions of the package.

Of course, in the long term we should be contacting the maintainers of those packages in Fedora, but not EPEL, and asking them if they’d mind us looking after an EPEL branch of their package. This would require more of us to be Fedora developers, though.

March 15, 2008

New apacheconf and monitoring thoughts

Filed under: Uncategorized — sxw @ 5:05 pm
Tags: , ,

Yesterday, I shipped a new apacheconf component, with some significant changes to its monitoring support.

Apache is a complicated beast, with many different mechanisms for configuring it. Apacheconf doesn’t necessarily handle all of these different options, and sometimes work arounds are necessary. For example, apache supports providing multiple ip:port combinations to a VirtualHost directive. Apacheconf only supports providing one. For this reason, Neil had configured a service with two VirtualHosts, both with the same server name. Unfortunately, apacheconf assumed that all of the server names would be unique on a given hosts, and so builds its Nagios service descriptions (which must be unique) based on these server names. Upshot of this is that we end up with a monitoring configuration that won’t load.

I’ve made two changes to help mitigate this. Firstly, every apacheconf virtualhost now has a
vhostnagiosmonitor directive, which can be set to false to disable monitoring for that virtual host. Secondly, the apacheconf translator now keeps a list of all of the service descriptions it has created, and adds uniquifiers to any duplicates (initially the IP address and, if that isn’t sufficient, a number).

In addition to this, a new lcfg-monitor has shipped containing a number of bug fixes.

In the long run, we need to give lcfg-monitor the ability to take a list of machines and components for which monitoring is disabled – so that, if this happens again, we don’t end up having to rush to fix broken configurations, or components, just to keep monitoring running for everyone else.

March 14, 2008


Filed under: Uncategorized — sxw @ 9:56 am
Tags: , ,

I’ve been experimenting with Mercurial, as a means of streamlining the way I work with the OpenAFSĀ  CVS repository. In particular, I’m trying to improve the management of my disconnected operation code, as well as better controlling the large number of patches I’m producing as part of the prototyping and error removal exercise.

Because I tend to flit backwards and forwards between different pieces of code, I tend to find that with CVS I have a large number of different checked out sandboxes. For big projects, such as the disconnection work, there’s no history, or ability to revert changes without taking a snapshot of the sandbox, which is both time consuming and inefficient. For smaller projects, there’s either a huge number of different sandboxes (and the related ‘where did I do X?’ problem), or lots of code ends up being intermingled within the same sandbox, and lots of things have to be unpicked before patches can be sent upstream.

These actually end up being two different problems, and it looks like there are two different mercurial workflows that are best suited to handle them. For disconnection, what I really need is a way of tracking, and managing, my code changes, and I’m using mercurial as a normal SCM to achieve this. With the prototyping changes, what I really need are patch queues – I have a large number of changes which I’m trying to arrange into manageable chunks in order to send upstream. Depending on my testing schedule, I may have a large number of patches awaiting submission. In this case, mercurial’s patch queues seem like by far and a way the best fit.

I’m intending on making my mercurial repositories for both of these tasks publicly available. For now, I can offer a mercurial import of the OpenAFS ‘upstream’ CVS at Other repositories are likely to appear there over time.

Local users may be interested to note that they can get mercurial on a DICE machine by including dice/options/mercurial.h in their profile.

February 26, 2008


Filed under: Uncategorized — sxw @ 2:45 pm
Tags: , , ,

Over the last weekend, I attended FOSDEM, an absolutely mind blowing conference bringing together Free and Open Source developers from all over Europe. The scale of the conference, attracting as it does thousands of developers, and accommodating hundreds of different talks over 2 manic days, really can’t be described. You have to be there to experience it.

I made the journey to Brussels by train, a most civilised way to travel – especially given that Eurostar are quite happy to replace lost return tickets for a small fee! The weekend started with the infamous beer event on the Friday night (hence the lost ticket), before getting down to business on the Saturday. It’s hard to pick particular highlights from such a packed program, but the perl6 talk managed to be both fascinating and scary at the same time and the cmake talk was very useful given the way Stephen is going with build tools. In the dev rooms, Dan Mosedale unfortunately didn’t make it for the Thunderbird talk, but an productive discussion was had none-the-less, and Jens Kuehnel’s introduction to SELinux in the Fedora devroom helped overcome a lot of my fears (and, in fact, has succeeded in its goal, as I no longer just switch it off). The sight of 100+ folk all participating in a PGP keysigning had to be seen to be believed (eventually, we just had to go outside, as the lecture theatre just wasn’t big enough)

I signed up a few months ago to present a Lightning Talk on OpenAFS, in an attempt to grow awareness, and attract new developers. That talk certainly helped me with talking to other people at the conference, as well as being pretty well received. Both slides, and video, of the talk are available from the FOSDEM site.

February 21, 2008

UKUUG Files and Backups Seminar

Filed under: Uncategorized — sxw @ 12:00 am
Tags: , ,

As previously trailed, I presented as part of this year’s UKUUG Files and Backups Seminar on the 19 Feb. My talk, on Ā OpenAFS, was a revised and extended version of the paper Craig and I wrote, and I presented at UKUUG’s Spring Conference the year before. Whilst that paper concentrated on Informatics’ experience in deploying OpenAFS, the seminar talk was far more outwards facing, discussings the pitfalls and benefits of any OpenAFS deployment across many different types of organisation. A copy of the slides is available from the UKUUG web site.

Both days of the seminar were a very interesting opportunity to take part in a number of focussed discussions about storage issues, as they affect a wide variety of different businesses. Charles Curran’s discussion of CERN’s data management issues (with LHC producing around 15PB of data every day) was a hilarious tour through the issues involved in managing vast amounts of experimental data, andĀ Kern Sibbald’s talk on Bacula was a fascinating discussion of what must be the industry’s leading Open Source backup technology. Kern and I had a chat afterwards about the issues involved in making Bacula AFS aware, such that it could easily handle both backup, and restoration of files from AFS volume dumps.

February 15, 2008


Filed under: Uncategorized — sxw @ 6:14 pm
Tags: , , , , , ,

I’m giving a few talks over the next couple of months

  • UKUUG Files and Backup Seminar I’m giving a general overview of AFS from a users and administrators perspective, particularly focusing on features that will be of interest to new deployments
  • FOSDEM I’m giving a developers overview of OpenAFS as a lightning talk
  • UKUUG Spring Conference I’m currently scheduled to give two talks. The first is an overview of our monitoring system, talking in particular about the benefits (and challenges) of integrating it with LCFG. The second is about our in-development account management system, prometheus, and some of its unique features.
  • AFS & Kerberos Best Practices Workshop

Iā€™m going to FOSDEM, the Free and Open Source Software Developersā€™ European Meeting

January 29, 2008

Integrating cosign with web sites

Filed under: Uncategorized — sxw @ 12:48 pm
Tags: , , ,

I’ve made a couple of changes over the last few days with a view to making it easier to integrate cosign authentication with web applications, and web sites in general. These are trivially available to sites which are built with the LCFG apacheconf and cosign components, and will be available in the next stable release.

Standard Logout Mechanism

Firstly, a standard logout CGI script is generated by the cosign component, as /var/www/cosign-logout/logout.cgi. Sites built with apacheconf can include the cosign-logout configuration fragment in their host defintion to map this to the /logout URI on their site.

Cosign requires a site-local logout mechanism due to the way in which it uses cookies to record user authentication. When a user is authenticated to cosign and accessing your site they have two cookies, one for your site, and one for the central cosign server. If your logout button only redirects to the central cosign logout page, then that site cookie will continue to exist – so users will be able to still access your site for a brief period of time after they have logged out. Needless to say, this tends to confuse people.

The local logout CGI will remove the local cookie, and then redirect them to the central login service. It should be linked (or redirected to) after your web application has performed whatever internal tidyup it requires on logout (for example, it may have its own cookies to remove).


For some services, it is desirable to check a user’s entitlements before allowing them access. Until the new account management technology is available, it is only possible to give local users entitlements, so the mechanism below cannot be used on services which allow access by iFriends.

Entitlements are accessible as LDAP groups, so can be checked using LDAP authorization. To enable this for your web server, you need to include dice/options/apacheconf-ldapauthz.h in the server’s profile. Then you should include the ldap-authz configuration fragment in the configuration of each site you wish to protect. The implementation details of this is different between the DICE Apache 1.3 build, and the Fedora Apache 2.2 system, which unfortunately changes the final configuration steps.

Apache 1.3

Individual sections of the site may then be protected by doing

<Location /my/secret/data>

CosignProtected On

AuthType Cosign

Require group my/entitlement/name


(my/entitlement/name is the entitlement that you want to restrict access to)

Apache 2.2

<Location /my/secret/data>

CosignProtected On

AuthType Cosign

Require ldap-group cn=my/entitlement/name,ou=Capabilities,dc=inf,dc=ed,dc=ac,dc=uk


(again, my/entitlement/name is the name of the entitlement you wish to restrict accces to. Note that you must specify the full DN of the entitlement, rather than just the name)

« Previous PageNext Page »

Theme: Rubric.