Whilst on the way back from the UKUUG conference (see the last post for details), Paul, Stephen, Gihan and I had a long talk about some of the structural issues we’ve encountered in the LCFG server. Some of those thoughts will unfortunately be lost in the mists of memory, however I thought it was worthing jotting down some notes from the very long chat we had about timestamps.
We have an LCFG architecture where a central source repository contains all of the data which the compiler uses to build profiles. Multiple (in our case 2, but theoretically an unlimited number of) machines pull data from the source repository and compile XML profiles from it, which they then serve to clients. Each XML profile must be accompanied by a unique timestamp, so that a client knows whether the XML profile it has just fetched is newer than the one it is currently using. An XML profile is created by compile multiple source files, starting from a single, per-XML-profile file which is also, confusingly, called a profile.
The problem is how this timestamp can be calculated in a robust fashion. The requirements for robustness are:
- Any change in a profile’s source data should always result in an increase in the timestamp (the timestamp must be increasing)
- The same XML profile must have an equivalent timestamp when served by multiple machines (it must be possible to compare XML profiles fetched from either server, and determine which is newer)
- These guarantees must apply regardless of downtime on any of the LCFG servers, and allow for LCFG servers with radically different compilation speeds
In addition, we’ve historically imposed a number of constraints upon the solution
- There can be no direct communication between the multiple LCFG servers.
- The LCFG servers cannot ‘talk back’ to the source repository
- There is no guarantee that all sources come from the same location (there may be multiple SCMs, for example)
- The LCFG servers cannot maintain state. This is a constraint that flows from our downtime guarantee – if the servers have state, and one goes down, there’s no guarantee that it will have the same state as the other one when it comes back up.
The Current Solution
Currently, we have a solution based on the timestamps of all of the source files that contribute to a particular profile. When a profile is compiled its timestamp is set to that of the most recent source file that contributed to that profile. There are two problems with this system
- Deleting an entry from a spanning map doesn’t result in a change of profile. If machine D publishes information into a spanning map that machine A subscribes to, and then machine D leaves the spanning map, there will be no change to the timestamp of A’s profile (in fact, it may go back in time). This is because the server does not maintain state. It never knows that D was in A’s profile, and so doesn’t know that the change in D’s configuration should affect A’s timestamp.
- Timestamp correctness is critical. This interferes with SCM systems, and with using tools such as rsync to copy profile data around. Both SCMs, and rsync set the timestamp of a file to its timestamp on the originating system. If other source files already have timestamps newer than these, then the files you have just copied in will not result in a change to the timestamp of the generated profile, even if they have changed, and the client won’t notice the changes.
CSNs not timestamps
During our discussion it became obvious that thinking of these identifiers as timestamps was counter productive. We aren’t actually interested in the time that the profile was built (or edited or …) at all, we just need an increasing number by which we can order the profiles which we receive. This change sequence number (CSN) can be any object to which an ordering relation can be assigned.
(I’m sure that heading will get me some odd search engine hits)
The requirement that the timestamp must be increasing is critical due to the current server selection algorithm used by the client. This means that which server a client uses will change with every request that client makes – so a situation where timestamps are not in lockstep between all of the servers will cause the client’s state to flap repeatedly as it switches servers.
It would be possible to partially solve this problem by making clients faithful, and having them only switch servers when one goes down. This means that the timestamp problem shifts from being a constant one to one which only occurs occasionally. However, it obviously removes all of the load balancing characteristics of the current system and would have to be carefully analysed before deployment.
This change also isn’t sufficient to permit removal of the ‘timestamp equivalence between servers’ robustness guarantee. When a machine switches from server A to server B it must be able to tell if the profile server B is offering it is newer, older, or the same as that on server A. However, there’s no guarantee that B has built all of the profiles that A built – it may have been running slower than B. So, we still need a way of assigning ordering on these occasions.
Include tree CSNs
We therefore started thinking about ways in which we could create CSNs using purely the data given to us from the source repository. We can’t use timestamps, as there’s no guaranteed correlation between the timestamp and a change. We decided that it was acceptable to require that every file within the repository have an increasing revision number associated with it, that that revision number must be increased every time the file is changed, and that that revision number be available to the LCFG compiler.
We then started thinking about mechanisms for composing these to produce CSNs. The issue here is that we have to be able to deal with deletion – a CSN which works by (for example) adding all of the revision numbers in the tree will fail in the face of a file being deleted – in this example the CSN would actually shoot backwards at that point.
Paul came up with the idea of modelling this as an inclusion relationship, where the revisions closer to the top of the tree are given more weight than those at the bottom. Given that removing a node from the tree requires modifying (and thus incrementing the revision of) the node above it, giving higher nodes more weight ensures that this deletion always results in a changed CSN. Hopefully an example will make this clearer.
The image on the left shows an inclusion tree for the profile for machine A. A includes the headers a and b which in turn include c, d and e. The numbers outside the circle are the revision number for each of these files.
If we say that weight of each level is 10 times that of the level below it, we could define a CSN for this file that looks like: 156 (the top level has a single node with a revision of 1, the second level has 2+3 =5, and the bottom level 1+2+3 =6). If we were to remove node b, then we would do so by changing A (so it now has a revision of 2). Our new CSN is then 226 – which is obviously larger, despite a section of the tree being removed.
This scheme unfortunately falls down when our summed revisions at each level become larger than the weighting factor we’re applying. However, Gihan asked why we need to treat this as a number at all. Instead, why not represent it as 1.5.6 (using the . as the level seperator). It’s trivial to define an ordering, and we have the ability to grow as large as we like at any given level.
This scheme pretty convincingly solves the first problem – we are no longer reliant on timestamps, and we have a way of producing a unique CSN from the source data. However …
The second part of our dilemma rears its ugly head.
In addition to incorporating data gleaned from files it includes, a profile may also contain data produced from spanning maps. Each of these spanning map contributions comes from a machine, and so may be versioned in the same way as that machine – for example if machine B has a CSN of 2.5.7, it’s spanning map contribution may be included as shown in the diagram below.
But, note that we no longer have a revisioned entry for the node which contains these contributions. This is the crux of the problem.
Without a revision number on this node, we can’t deal with D disappearing. If we maintain the number locally, then we can’t keep our two servers in lock step.
The presence of spanning maps breaks what looks like an elegant CSN maintainance scheme. In fact, we came to the conclusion on the train that it is the very nature of the way that spanning maps work that means we can’t easily time stamp them. In the including relationship entries are pulled from the top down (that is, file a includes files c and d) In order to remove file d, we have to modify file a, and that modification will always contribute to our new CSN.
The spanning map case is different, in that it may well be an entry in k (for example) that results in D’s inclusion. If D is no longer included due to a change in k, then k is no longer included in the CSN computation, and so things break. It is the direction of this inclusion order, a fundmental part of the power of spanning maps, which makes composing any kind of CSN (be it from timestamps, or using this scheme) impossible.
The only way to work round this is if you have a scheme where the revision of every file from which a profile may potentially be built (including deleted files) is included in the CSN of that profile. One way of acheiving this is to make the source code repository create a unique CSN for every change in the repository. You get this for free if everything in the repository comes from a SCM system like SVN, but as soon as even one file comes from an external source, you need an external mechanism. This external mechanism must be applied at a common point in the process (that is, on the source server, rather than on the compilation machines).