Simon's Musings

April 3, 2008

Thoughts from the train: LCFG Timestamps

Filed under: Uncategorized — sxw @ 1:30 pm
Tags: ,

Whilst on the way back from the UKUUG conference (see the last post for details), Paul, Stephen, Gihan and I had a long talk about some of the structural issues we’ve encountered in the LCFG server. Some of those thoughts will unfortunately be lost in the mists of memory, however I thought it was worthing jotting down some notes from the very long chat we had about timestamps.

We have an LCFG architecture where a central source repository contains all of the data which the compiler uses to build profiles. Multiple (in our case 2, but theoretically an unlimited number of) machines pull data from the source repository and compile XML profiles from it, which they then serve to clients. Each XML profile must be accompanied by a unique timestamp, so that a client knows whether the XML profile it has just fetched is newer than the one it is currently using. An XML profile is created by compile multiple source files, starting from a single, per-XML-profile file which is also, confusingly, called a profile.

The problem is how this timestamp can be calculated in a robust fashion. The requirements for robustness are:

  • Any change in a profile’s source data should always result in an increase in the timestamp (the timestamp must be increasing)
  • The same XML profile must have an equivalent timestamp when served by multiple machines (it must be possible to compare XML profiles fetched from either server, and determine which is newer)
  • These guarantees must apply regardless of downtime on any of the LCFG servers, and allow for LCFG servers with radically different compilation speeds

In addition, we’ve historically imposed a number of constraints upon the solution

  • There can be no direct communication between the multiple LCFG servers.
  • The LCFG servers cannot ‘talk back’ to the source repository
  • There is no guarantee that all sources come from the same location (there may be multiple SCMs, for example)
  • The LCFG servers cannot maintain state. This is a constraint that flows from our downtime guarantee – if the servers have state, and one goes down, there’s no guarantee that it will have the same state as the other one when it comes back up.

The Current Solution

Currently, we have a solution based on the timestamps of all of the source files that contribute to a particular profile. When a profile is compiled its timestamp is set to that of the most recent source file that contributed to that profile. There are two problems with this system

  • Deleting an entry from a spanning map doesn’t result in a change of profile. If machine D publishes information into a spanning map that machine A subscribes to, and then machine D leaves the spanning map, there will be no change to the timestamp of A’s profile (in fact, it may go back in time). This is because the server does not maintain state. It never knows that D was in A’s profile, and so doesn’t know that the change in D’s configuration should affect A’s timestamp.
  • Timestamp correctness is critical. This interferes with SCM systems, and with using tools such as rsync to copy profile data around. Both SCMs, and rsync set the timestamp of a file to its timestamp on the originating system. If other source files already have timestamps newer than these, then the files you have just copied in will not result in a change to the timestamp of the generated profile, even if they have changed, and the client won’t notice the changes.

CSNs not timestamps

During our discussion it became obvious that thinking of these identifiers as timestamps was counter productive. We aren’t actually interested in the time that the profile was built (or edited or …) at all, we just need an increasing number by which we can order the profiles which we receive. This change sequence number (CSN) can be any object to which an ordering relation can be assigned.

Client Promiscuity

(I’m sure that heading will get me some odd search engine hits)

The requirement that the timestamp must be increasing is critical due to the current server selection algorithm used by the client. This means that which server a client uses will change with every request that client makes – so a situation where timestamps are not in lockstep between all of the servers will cause the client’s state to flap repeatedly as it switches servers.

It would be possible to partially solve this problem by making clients faithful, and having them only switch servers when one goes down. This means that the timestamp problem shifts from being a constant one to one which only occurs occasionally. However, it obviously removes all of the load balancing characteristics of the current system and would have to be carefully analysed before deployment.

This change also isn’t sufficient to permit removal of the ‘timestamp equivalence between servers’ robustness guarantee. When a machine switches from server A to server B it must be able to tell if the profile server B is offering it is newer, older, or the same as that on server A. However, there’s no guarantee that B has built all of the profiles that A built – it may have been running slower than B. So, we still need a way of assigning ordering on these occasions.

Include tree CSNs

We therefore started thinking about ways in which we could create CSNs using purely the data given to us from the source repository. We can’t use timestamps, as there’s no guaranteed correlation between the timestamp and a change. We decided that it was acceptable to require that every file within the repository have an increasing revision number associated with it, that that revision number must be increased every time the file is changed, and that that revision number be available to the LCFG compiler.

We then started thinking about mechanisms for composing these to produce CSNs. The issue here is that we have to be able to deal with deletion – a CSN which works by (for example) adding all of the revision numbers in the tree will fail in the face of a file being deleted – in this example the CSN would actually shoot backwards at that point.

Paul came up with the idea of modelling this as an inclusion relationship, where the revisions closer to the top of the tree are given more weight than those at the bottom. Given that removing a node from the tree requires modifying (and thus incrementing the revision of) the node above it, giving higher nodes more weight ensures that this deletion always results in a changed CSN. Hopefully an example will make this clearer.

Example tree The image on the left shows an inclusion tree for the profile for machine A. A includes the headers a and b which in turn include c, d and e. The numbers outside the circle are the revision number for each of these files.

If we say that weight of each level is 10 times that of the level below it, we could define a CSN for this file that looks like: 156 (the top level has a single node with a revision of 1, the second level has 2+3 =5, and the bottom level 1+2+3 =6). If we were to remove node b, then we would do so by changing A (so it now has a revision of 2). Our new CSN is then 226 – which is obviously larger, despite a section of the tree being removed.

This scheme unfortunately falls down when our summed revisions at each level become larger than the weighting factor we’re applying. However, Gihan asked why we need to treat this as a number at all. Instead, why not represent it as 1.5.6 (using the . as the level seperator). It’s trivial to define an ordering, and we have the ability to grow as large as we like at any given level.

This scheme pretty convincingly solves the first problem – we are no longer reliant on timestamps, and we have a way of producing a unique CSN from the source data. However …

Spanning Maps

The second part of our dilemma rears its ugly head.

In addition to incorporating data gleaned from files it includes, a profile may also contain data produced from spanning maps. Each of these spanning map contributions comes from a machine, and so may be versioned in the same way as that machine – for example if machine B has a CSN of 2.5.7, it’s spanning map contribution may be included as shown in the diagram below.

Inclusion tree with spanning mapsBut, note that we no longer have a revisioned entry for the node which contains these contributions. This is the crux of the problem.

Without a revision number on this node, we can’t deal with D disappearing. If we maintain the number locally, then we can’t keep our two servers in lock step.

The presence of spanning maps breaks what looks like an elegant CSN maintainance scheme. In fact, we came to the conclusion on the train that it is the very nature of the way that spanning maps work that means we can’t easily time stamp them. In the including relationship entries are pulled from the top down (that is, file a includes files c and d) In order to remove file d, we have to modify file a, and that modification will always contribute to our new CSN.

The spanning map case is different, in that it may well be an entry in k (for example) that results in D’s inclusion. If D is no longer included due to a change in k, then k is no longer included in the CSN computation, and so things break. It is the direction of this inclusion order, a fundmental part of the power of spanning maps, which makes composing any kind of CSN (be it from timestamps, or using this scheme) impossible.

Possible directions

The only way to work round this is if you have a scheme where the revision of every file from which a profile may potentially be built (including deleted files) is included in the CSN of that profile. One way of acheiving this is to make the source code repository create a unique CSN for every change in the repository. You get this for free if everything in the repository comes from a SCM system like SVN, but as soon as even one file comes from an external source, you need an external mechanism. This external mechanism must be applied at a common point in the process (that is, on the source server, rather than on the compilation machines).

UKUUG spring conference

Filed under: Uncategorized — sxw @ 11:04 am
Tags: , , , ,

I’ve just got back from the UKUUG Spring Conference, where a group of us from Informatics (myself, Stephen, Paul and Gihan from Flexiscale) were giving talks. I talked on two subjects – the LCFG based monitoring system framework I developed last year, and the new account management system I’m currently writing. Slides from both of these talks are available on the DICE publications page, which also has Stephen’s slides from his “An end to hacky scripts” talk about the LCFG system.

Despite gaining a scripting language track, and the addition of a parallel one-day PostgreSQL conference, the event seemed smaller this year, with many of the familiar faces missing. Some unfortunate scheduling meant that switching between tracks wasn’t as easy as it could have been, with 45 minute sessions in one room scheduled against 30 minute sessions in another one. However, the event was still productive, useful and stimulating, with a number of interesting talks – slides and audio from which should hopefully be up on the conference website shortly.

Some highlights were the talk from Mark Gledhill from the BBC on “Feeding the BBC Homepage“, which provided a fascinating insight into perl and Catalyst usage at a large organisation, as well as giving a useful background on their project management techniques, and test and deployment issues. Gavin Henry’s talk on OpenLDAP 2.4 provided a valuable summary of the changes in the latest version of OpenLDAP, as well as giving some examples of practical uses for these new features. Randy Appleton’s “Today’s Software … Is It Really Bloated?” talk took a very humorous tour through a number of code size and performance statistics he and his students have been collecting over the years – a perfect start to the day after the conference dinner!

The Transitive (which I ended up seeing because it was swapped with the talk I wanted to hear – one peril of last minute schedule changes!), and ZFS talks pretty much repeated material I’d heard at other conferences, but the ZFS one, in particular, was a helpful reminder of a system I’d really like to have time to look at in more detail. Whilst I wasn’t specifically interested in the scripting language talks, I did manage to catch “USENET Gems” which provided details of a number of interesting perl quirks, which are now firmly filed as things to watch out for.

Paul and Stephen arranged a well attended LCFG BOF on the Tuesday afternoon, and Paul, Stephen and I took some time to chat on Wednesday about possible designs for the new LCFG compiler. As with all UKUUG conferences, it tends to be these unscheduled events, and impromptu corridor conversations where the real value lies. There was a large amount of interest in prometheus, both from people in the commercial sector who have deployed similar systems, and had insights to share, and those who are interested in similar systems for their own sites. Hopefully we’ll be able to build some kind of a community around this technology.

There was continued interest in OpenAFS and Kerberos, with a number of people asking questions both about the technology, and our deployment experiences. Access to the source code for the monitoring system was also in demand – I really should arrange to publish this somewhere less adhoc.

Theme: Rubric.