Comparing LCFG XML profiles

August 31, 2009

Recently I have been having lots of “fun” working out how to compare LCFG XML profiles generated by different LCFG servers to see if they are functionally equivalent. As a first step I removed the contents of all the nodes which are obviously server-dependent, these are: published_at, published_by and server_version. This really is only the beginning of the job though, nearly every component and package node has a derivation attribute which holds a list of paths that are dependent on the paths to the server input directories. I came up with a cunning scheme to reduce these paths to the shortest form, this takes in all the lists of directories involved, sorts them by depth so that the most specific is removed and converts them into regular expressions to handle the release and host name format strings.

After this I really thought I had cracked the problem but this turned up some issues with the LCFG server which lead to making a code change. It turned out that the LCFG servers generated lists of nodes for spanning maps in an unpredictable order which varied between hosts. This doesn’t really bother the clients using the data but it doesn’t fit well alongside normal LCFG taglists where the order is considered important and is intended to be maintained. The result is that we now sort the taglists generated from spanning maps when they are added to a subscribing profile.

Again, having thought I had the problem solved more issues have turned up today. It turns out that I also need to canonicalise the file path in the last_modified_file node in a similar fashion to the derivation attribute values. A more annoying issue though that has appeared is that when the value for this node could come from one of several files with identical timestamps it doesn’t seem to be possible to predict which file will be selected. I feel more code changes in the LCFG server are now required…