Simon's Musings

May 26, 2009

Converting OpenAFS to git

Filed under: Uncategorized — sxw @ 12:46 pm
Tags: , ,

For a while now, there have been plans afoot to convert OpenAFS’s CVS repository to git. A number of attempts have been made, which have all stalled due to the complexity of the underlying problem, and issues with the existing tools. Previously, it was felt that the main hurdle to a successful conversion was OpenAFS’s use of ‘deltas’ to provide a changeset style interface on top of CVS. A delta is a collection of related changes, grouped using a comment in the CVS revision log. However, unlike a real changeset, there is no requirement that a delta’s changes be contiguous. A file may be modified by delta A, then by delta B, and then modified by delta A again. This makes it impossible to properly represent all deltas as single changesets. In addition, abuse of deltas within OpenAFS has caused some to span branch or tag points, again making it impossible to represent those deltas as a changeset without destroying the repository history. For many months now, people have been trying to produce conversion tools that achieve as close to a 1 to 1 correspondence between deltas and changesets as is possible, just leaving the troublesome cases as multiple commits.

Frustrated with the lack of progress of this approach, I decided to do a quick and dirty conversion, with the view to getting something completed by the start of coding for this year’s Summer of Code (which I’ve missed) and the yearly Best Practices Conference (which I might just make). I decided to not concern myself with merging deltas at all, but instead use cvsps and the existing git-cvsimport tool to produce a tree where the branch heads and all tag points matched, and which retained enough information to reconstruct deltas without forcing them to be single changesets. In order to be able to perform simple manipulations, I decided to create a perl script which would post-process the cvsps output before feeding it to git. I also created a tool which would check out every branch and tag from cvs, and compare them to the corresponding item in git, and report on any errors. Pretty straightforwards, I thought …

Unfortunately, I rapidly discovered that cvsps had significant problems with the OpenAFS repository. Many tags in CVS were simply not in the cvsps output, other tags (both those marked as FUNKY and INVALID, and those not) were in the wrong place and branchpoints were being incorrectly determined. Rather than get into cvsps’s internals, I ended up extending my post processing script to deal with these errors. It now performs a number of tasks:

Reordering inverted patchsets Some of cvsps’s output gets the patchset ordering wrong, such that a patchset that does fileA:1.2->1.3 comes before fileA:1.1->1.2. The script scans through all of the patchsets for this problem and swaps any that it finds.

Tag point determination Using the output from CVS’s rls command, it is possible to get the revision numbers of every file in a given tag. With this information, the set of patchsets from cvsps can be walked in order to identify the first patchset to satisify every revision contained within the tag. Unfortunately, cvsps patchsets aren’t correctly ordered, so this process also works out how to reorder the patch sets such that no patchsets with file revisions higher than those in the tag occur before the tag point. This reordering is carefully performed in order to not break any tag or branch points which we have already placed! In addition, cvsps sometimes merges commits which occur over a tag point, so we also need to split patchsets which contain both files with revisions before the tag point, and files with revisions after it.

Branch point determination The cvsps output incorrectly places many of OpenAFS’s branchpoints. Fortunately, many of these were marked by tags at the time they were created, and a hardcoded list of these is used to place some branch points in the correct position. For branches that don’t have a corresponding tag, a brute force approach is used. By examining all of the patchsets on the branch, it’s possible to determine the starting revision number of every file that’s modified on that branch – combining this with the contents of the branch head tag from cvs rls gives the equivalent of a tag marking the branchpoint. This can then be processed by the tag point algorithm to work out the correct position in which to place the branch point. This gives the patchset that the branch occurs after, rather than cvsps’s “Ancestor branch” field, which gives the first patchset on the branch. Ancestor branch is fundamentally flawed, as it doesn’t allow changes to occur on HEAD after a branchpoint is created, and before the first patch on that branch. As part of this process, git-cvsimport was modified to understand a new ‘post-patchset’ branch syntax

Unknown branch determination cvsps fails to place some patchsets on the correct branch. By comparing the revision numbers of files in these patchsets with those elsewhere in the list, the correct branch for all of these can be determined (this needs to be done in order that we can work out tag points, as well as being necessary for repository consistency)

We also clean up the output to deal with problems of our own making

Delta naming Whilst there is a style of  delta name where the branch is given first, a number of deltas don’t conform to this style, and have the same name across multiple branches. All deltas are renamed such that they are something like STABLE14-delta-name-yyyy-mm-dd

Mistagged files In some places, tags have been incorrectly applied such that files on HEAD are tagged as part of a branch tag. The script contains manual overrides to fix these to tag a revision on the correct branch.

Finally, because having done all of the above I had a pretty good toolset for dealing with patchsets, I implemented support for merging deltas. This merges all bar 10o or so, out of 15,000 deltas into single patchsets. The remaining items are comprised of deltas which span tag or branch points (and which can never be merged) and deltas which contain conflicting changes to a single file (which it might be possible to merge, but which would require manual intervention). These deltas are tagged in a set of git references at refs/deltas/branches/<branch>/<delta>. We separate them from tags so that git-tags doesn’t have to deal with over 10,000 tags, and split them into branches to avoid directory size limitations.

The resulting git tree isn’t a perfect replica of the CVS repository. It has a number of issues which it’s going to be really difficult to fix, and which probably aren’t earth shattering

It contains additional files There are a number of places where a user has added additional directories to cvs. When another user has subsequently tagged their working directory for a release, they haven’t done a cvs update -d, and so these additional directories (and in a small number of cases, files) aren’t contained in the tag for that release. It’s impossible to create a patchset ordering which allows a git tag to not include these directories, so we end up with additional files in the git tag. I don’t think that this is a particular problem

It’s missing a tag There is a tag in the CVS repository (BP-openafs-rxkad-krb5-lha) which is so broken that it’s impossible to construct a patch set ordering that matches it. It is just omitted from the resulting git repository

One branch is bad The openafs-rxkad-krb5-lha branch was created by only branching certain files in the tree. This means that it’s impossible to create a git branch which mimics this one without creating a huge number of additional ‘pull-up’ patch sets. Whilst we include all of the changes that were made on this branch, the final branch state is very different from the one in CVS.

Some deltas are disjoint As discussed, some deltas cannot be merged into single patchsets. This is going to require a new wdelta style tool which understands how to merge these deltas.

May 7, 2009

remctl everywhere

Filed under: Uncategorized — sxw @ 9:26 pm

We’ve been using Russ Allbery’s excellent remctl tool for some time now within our monitoring system, to provide a mechanism for communicating results between multiple servers. Stephen has been working on expanding that support out into something more generic, and I’ve just thrown the switch to make the remctl client available on all of our systems. This will eventually let us use remctl to replace most of our adhoc remote command execution technologies.

April 26, 2009

Mocking Fedora 11

Filed under: Uncategorized — sxw @ 6:52 pm
Tags: , , ,

All of the OpenAFS Fedora/RHEL package builds are done from a machine running Scientific Linux 5. This is Informatics’s managed Linux platform, and using it for the build machine means that it is administered and updated along with the rest of our services, leaving me with more time to do other things. We use the ‘mock’ command to perform builds for all of our x86_64 and i386 architectures – mock uses yum and rpm to construct a chroot for each build architecture, and then runs the build within that chroot. Unfortunately, this is where the fly in the ointment occurs.

RPM has been pretty stable for years, allowing this cross platform building to occur. With Fedora 11, however, a new version of rpm has been shipped. This contains support for longer file digests, and packages from Fedora 11 cannot be installed by older versions of rpm. Unfortunately, this means that we can’t mock Fedora 11 from a normal SL5/EL5 build host. Fortunately, solutions are available. 

A version of RPM 4.6 with support for the extended hashes, and which builds on EL5, is available from http://people.redhat.com/mitr/sha256-rpm/ This does change the RPM soname, and will require that packages with dependencies on rpm be rebuilt, and (in some cases) modified to support the newer API.

In addition to the change in RPM hashes, yum itself also needs to be modified to support Fedora 11. The pyhashlib package is needed to give yum support for other hash formats, and a newer version of yum is required to use them.

March 27, 2009

AFS & Kerberos Best Practices Workshop

Filed under: Uncategorized — sxw @ 1:01 pm
Tags: , , , ,

Once again, I’m presenting at the AFS & Kerberos Best Practices Workshop. This years event is at Stanford University from June 1st-5th

I’m giving two talks, the first on prometheus, our new Identity Management System. The second is about how to contribute to OpenAFS. The abstracts are

Prometheus is an LDAP based provisioning system, which is designed to manage a wide variety of user databases, including AFS’s PTS and a Kerberos KDC. It is highly flexible in the databases it supports, and permits very fine grained delegation of control. It has a role-based access control model, and allows the creation and management of roles by any authorized user. It is instance aware, allowing users to create many instances of a primary account, request keytabs of those instances, and delegate particular permission sets to individual instances. Prometheus is designed to be a distributed as possible, permitting provisioning of system maintained by disparate groups without requiring those groups be trusted by the system itself. This talk will discuss the design goals behind Prometheus, provide an update on implementation progress, and demonstrate a running system.

and …

OpenAFS has a huge, daunting codebase, with a relatively opaque system of patch submission, review and application. It takes mountains of skill, and years of persistence to get your first patch into a state where it can be submitted, let alone accepted into the hallowed halls of the code tree…

Nonsense!

This talk will attempt to blow away some of the misconceptions with regards to contributing to OpenAFS. It will provide a first-timers view of the steps, both technical and political, to crafting a patch for submission into OpenAFS. We’ll take a whistle stop tour of the tools now involved in the process, from the code repository, to the patch review system and the bug tracker. We’ll talk about code review, bug triage and testing, with a view to inspiring participation in these areas.

Finally, we’ll talk about some low hanging fruit that anyone could get started on, and write their first bit of OpenAFS code …

In addition to keynotes from Morgan Stanley and Carnegie Mellon, the conference features a number of talks about research computing storage (including one from the nanoCmos project), and looks like it will have a great mixture of academic and commercial topics.

The hotel block (at the very reasonable Stanford Guest House) expires April 1st, with the early bird deadline being April 21st.

UKUUG Spring Conference

Filed under: Uncategorized — sxw @ 12:16 pm
Tags: , , , , ,

I’ve just returned from the spending 3 days in London at the UKUUG Spring Conference. I presented a Kerberos tutorial on the first day, and spent the following 2 as a conference delegate. The tutorial was well attended, with over 50 people there on the day, and seemed to go really well with a lot of good feedback from the attendees.

The second and third days were taken up with the conference proper. There seemed to be more delegates than in previous years, although the number of talks was smaller, with only one conference track. Whilst holding the conference in London obviously served to increase its appeal to those living locally, the venue wasn’t entirely ideal. Whilst the space for the talks was fine, there was a lack of break out and foyer space, making lunch and coffee breaks a scramble for space, and in depth conversations out of the conference hall harder.

The talks themselves covered a good mixture of topics, with security, LDAP and monitoring being particularly prevalent. The conference started with a presentation from Barry Scott of Centrify about integrating Unix boxes with Active Directory.  This gave a good overview of the situation (and said some nice things about the Kerberos tutorial), but talked more about their commercial product than what was possible with the available open source tools. From my perspective, this was a slightly missed opportunity, although the overview would have been of use to anyone contemplating that integration.

Later in the day,  Andrew Findlay gave a very strong and well presented talk on LDAP access control policies. (there is also a pdf paper) Whilst this continued the logical progression from what Andrew’s said about LDAP ACLs at previous conferences, it wrapped all of his current thinking up into a single, easily digestible block. It reconfirmed some of my design choices with prometheus, and challenged others. 

After lunch, there was a “Systems Monitoring Shootout“, comparing the features of various different systems monitoring packages. There were some really interesting ideas in here, including the use of NagiosGraph to produce rrd files which can then be used for trend and capacity analaysis. Following this, Jane Curry presented on ZenOss, a Zope based network monitoring tool. This appeared to be more network focussed than the service focus of Nagios, with lots of features like automatic device discovery and a very pretty looking interface. However, nothing that convinced me we should drop Nagios and use it instead. Finally in this session we had a very well presented skip through the … interesting … things you could do the the SCSI bus with sysfs, and the power of lvm in terms of disk management. 

In the final session of this day, Darren Moffat from Sun ran through some of the security features in Open Solaris. As well as a name check for my OpenSSH work, Darren talked about the new concept of role users, the move towards privileges in the kernel, and the additional RBAC work that’s in OpenSolaris. He also trailed the encryption features which will shortly be appearing in ZFS. All in all, a fascinating talk.

After Gavin Henry had talked about the replication strategies currently available in OpenLDAP, Howard Chu gave a great talk about its new MySQL NDB backend. Primarily developed with telco grade customers in mind, this allows you to share your database between MySQL and OpenLDAP, and take advantage of NDB’s clustering properties to linearly scale your load by simply adding more servers. The downside is that there are fixed constraints on attribute set size and tree depth. So, not a new general purpose backend, but a real insight into the large scale deployments that Symas is doing with OpenLDAP. I took the opportunity to quiz Howard about API stability for overlays – his answer unfortunately confirmed my view that the API isn’t stable enough to let us use them for prometheus.

Continuing the telco theme, Craig Gellen spoke about OpenNMS, a network management system which was designed from the ground up for large scale enterprise and telecommunications customers. Again, this system seems more network than systems monitoring focussed, and probably far too complex for our needs, but it was really interesting to see a piece of Open Source software which is specifically targeted at this market.

The final session started with a couple of virtualisation talks. Kris Buytaert talked about the current, and ever shifting, state of the Open Source virtualisation world, including a discussion of the current allegiances of the major vendors. Following this openQRM, an open source, virtual datacentre management tool, was presented. Matthias Rechenburg’s talk focussed in particular on cloud computing. OpenQRM has an automated provisioning model, where a user can use a web interface to request (and pay for!) a certain amount of time on a certain number of auto built virtual machines. The talk concluded with a demo that both worked, and held the audiences attention – no mean feat!

Alex Howells from Gradwell gave the final talk of the day – a tour of the major external security threats he’s become aware of during his time managing systems for Bytemark and Gradwell. This was a detailed look at the common security issues on today’s internet, as well as giving helpful advice on how to counter them. Whilst some things (for example using fail2ban on external facing services) would be easy to put into practice here, others (requiring code review for everything that runs on a web server) wouldn’t be appropriate to our environment. All in all though, this was a good talk, containing a lot of things to ponder, and a great way to end the conference.

Despite having a smaller set of talks than in the past, the technical content of the conference seemed stronger than it has been in the last couple of years. Having a single track did help to improve its focus, although the reduction in moving around, coupled with the lack of break out space did reduce the opportunities to interact with other delegates. The UKUUG are changing the focus of their Summer Conference (which has typically been Linux based) to encompass a very wide scope, some of which overlaps with the LISA focus of this event. I suspect its long term future remains to be seen.

All in all, though, I think the UKUUG Spring Conference is a very useful event to attend.

February 3, 2009

Opting out of Nagios Notifications

Filed under: Uncategorized — sxw @ 11:34 am
Tags: ,

If you are going to be away for a long amount of time, you can opt out of all Nagios notifications by changing some entries in your LDAP record. Unfortunately the UI for this is currently pretty non-existent, so here’s some low level LDAP hackery that should acheive the desired results…

First things first, you need to have the nagiosUser objectClass. You can get that, by running the following ldapmodify command (The lines in black are what you type, lines in grey are examples of return from the command)

[boogaloo]sxw: ldapmodify -h ldap.inf.ed.ac.uk
SASL/GSSAPI authentication started
SASL username: sxw@INF.ED.AC.UK
SASL SSF: 56
SASL installing layers

dn: uid=sxw, ou=People,dc=inf,dc=ed,dc=ac,dc=uk
changetype: modify
add: objectClass
objectClass: nagiosUser
modifying entry “uid=sxw, ou=People,dc=inf,dc=ed,dc=ac,dc=uk”

Type CTRL-D to exit the ldapmodify command.

Now that you’ve got the relevant objectClass, you need to configure your Nagios settings so that you aren’t bothered. There are a number of ways of doing this, but the easiest is to set the notification period (the times of the day which Nagios will tell you of problems) to be none, which is a predefined period meaning ‘never tell me’.

[boogaloo]sxw: ldapmodify -h ldap.inf.ed.ac.uk
SASL/GSSAPI authentication started
SASL username: sxw@INF.ED.AC.UK
SASL SSF: 56
SASL installing layers

dn: uid=sxw,ou=People,dc=inf,dc=ed,dc=ac,dc=uk
changetype: modify
add: nagiosHostNotificationPeriod
nagiosHostNotificationPeriod: none

add: nagiosServiceNotificationPeriod
nagiosServiceNotificationPeriod: none

modifying entry “uid=sxw,ou=People,dc=inf,dc=ed,dc=ac,dc=uk”

As before, type CTRL-D to exit the ldapmodify command

After the usual propagation dance has occurred, you will find you’ll stop getting Nagios notifications. Just remember to turn them back on (by deleting these two attributes) when you get back!

Update: Graham just asked in the chatroom what the required incarnation to disable this is. Just so you don’t have to wait until I get back, here it is…

[boogaloo]sxw: ldapmodify -h ldap.inf.ed.ac.uk
SASL/GSSAPI authentication started
SASL username: sxw@INF.ED.AC.UK
SASL SSF: 56
SASL installing layers

dn: uid=sxw,ou=People,dc=inf,dc=ed,dc=ac,dc=uk
changetype: modify
delete: nagiosHostNotificationPeriod

delete: nagiosServiceNotificationPeriod

modifying entry “uid=sxw,ou=People,dc=inf,dc=ed,dc=ac,dc=uk”

January 31, 2009

UKUUG Kerberos tutorial

Filed under: Uncategorized — sxw @ 11:56 pm
Tags: , , ,

I’m presenting a Kerberos tutorial on the 24 March as a precursor to this year’s UKUUG Spring Conference in London. This will be a revised and extended version of the tutorial I presented 2 years ago. This year there will be an additional afternoon session which will give an opportunity to go into some more advanced topics in greater detail, and to allow a wider opportunity for discussion.

The abstract is:

This year’s Kerberos training tutorial will be presented in two parts. The morning’s session will be aimed at users without any particular Kerberos knowledge, but with an interest in deploying a site-wide authentication solution. We’ll cover the basics of the Kerberos protocol, examine common deployment considerations and discuss realm administration strategies. Suggestions will be made for methods of Kerberising many popular applications, and we’ll touch upon the issues involved in controlling delegation and key type management.

The afternoon’s session will look at a series of more advanced topics. It will be targetted at those who have either attended the morning session, or who already have a good working knowledge and deployment of Kerberos. We’ll look at the issues involved in running Kerberos across many different platforms, the challenges presented by mobile devices, and at extending Kerberos single sign on to the web. We’ll discuss the issues involved in rekeying existing Kerberos realms, and look at mechanisms for adding Kerberos support to existing local code and protocols. A number of methods of making interactions with Kerberos appear seamless from the user’s perspective will be presented, and ways of leveraging Kerberos into the world of public key certificates will be discussed. Finally, there will be an opportunity for attendees to get advice and feedback both from the tutor and other attendees on particular issues facing their site.

Further details are available from the UKUUG site.

New WordPress version for blog.inf.ed.ac.uk

Filed under: Uncategorized — sxw @ 11:24 pm
Tags: ,

As a parting gift, I’ve just updated the WordPress version we’re using on blog.inf.ed.ac.uk to 2.7, as noted over here. The user interface has undergone a pretty radical overhaul, and there is a good set of new features.

Enjoy!

January 26, 2009

DICE Labs goes live

Filed under: Uncategorized — sxw @ 5:25 pm
Tags: , , , , ,

DICE Labs, which I originally suggested back in early 2008, is now an reality. Whilst I’ve been running a number of ‘not a service’ systems for a while, this puts them on an official footing, and lets a much wider set of folk play around with them than just the CO team. Lots of things that I built in the past that now run in production went through this ‘Not a Service’ phase, including bugzilla and the wiki. Jabber and iFile were also built as part of this process, but aren’t part of the DICE Labs launch offering as they’re in the process of graduating into fully fledged services of their own.

January 24, 2009

Using fstrace to debug the AFS Cache Manager

Filed under: Uncategorized — sxw @ 9:25 pm
Tags: , ,

Fstrace is an AFS utility which logs a huge amount of information about the internal operations of the AFS cache manager. It can provide details of the process flow through the AFS cache manager, without requiring any recompilation or specific debug options.

To use fstrace, you need a /usr/vice/etc/C/afszcm.cat file to be installed. Unfortunately, this file is not currently installed as part of RPM builds, and without it, the output from fstrace is pretty much useless. However, you should be able to take the afszcm.cat file from any build of the same AFS version and use it with your kernel module.

To start logging, initialize the logging system with

  • fstrace clear cm
  • fstrace setlog cmfx -buffers 100

and then, to start the logging session, and dump the log output to a file

  • fstrace sets cm -active
  • fstrace dump -follow cmfx -file /tmp/log -sleep 10 &

Then, perform the operations which you want to obtain a log for. You may find that the dump needs a short period of time (or another command to be invoked) in order to flush all of the data from the command that you’re logging onto disk.

To stop logging, run

  • fstrace sets cm -inactive

Update 27/03/09: As of OpenAFS 1.4.9, the afszcm.cat file will ship as part of the standard RPM installation

« Previous PageNext Page »

Theme: Rubric.