Simon's Musings

July 26, 2009

Making the OpenAFS client faster

Filed under: Uncategorized — sxw @ 1:41 pm
Tags: , , ,

During a course of a project here it became apparent that the Linux OpenAFS cache manager is slow when performing reads from the local disk. In this case, all of the data is already on the local disk, and the cache manager knows that the data is up to date. Naively, you would imagine that reading this data would take roughly the same time as if you were reading directly from the cache filesystem. However, that is not the case – in fact, reads appear to be more than twice as slow when fetched through the AFS cache manager, as compared to fetching the equivalent files from the local disk.

I’ve implemented modifications to the cache manager which attempt to reduce this speed deficit. These modifications can be broadly split into 5 sections

Remove crref() calls

Pretty much every call into the OpenAFS VFS does a crref(), to get a reference to the users current credentials, despite the fact that this information isn’t always required. crref is relatively expensive – it acquires a number of locks in order to perform its copies, and can be a cause of unnecessary serialisation. By only calling crref when required we can gain a small, but measurable, performance increase

Reduce the code path leading to a cache hit

In readpages, we perform a lot of setup operations before we discover whether the data we’re interested in is cached or not. By making the cached case the fast path, we can gain a performance increase for cache hits, without causing a noticable degradation for cache misses.

Remove abstraction layers, and use native interfaces

The code currently uses operating system independent abstraction layers to perform the reads from the disk cache. These don’t know anything about the way in which Linux organises its virtual memory, and do a significant amount of extra, unnecessary work. For example, we use the ‘read’ system call to read in the data, rather than the significantly faster readpages(). As we’re being invoked through the AFS module’s readpages() entry point, we can guarantee that we’re going to be fetching a page off disk. Read() also gets called from a user, rather than kernel, memory context, adding to the overhead.

Implement readahead

The Linux Cache Manager currently has no support for readpages(), instead requiring the VFS layer request each page independently with readpage(). This not only means that we can’t take advantage of cache locality, it also means that we have no support for readahead. Doing readahead is important, because it means that we can get data from the disk into the page cache whilst the application is performing other tasks. It can dramatically increase our throughput, particularly where we are serving data out to other clients, or copying it to other locations. Implementing readpages() on its own gives a small speed improvement, although blocking the client until the readpages completes kind of defeats the point, and leads to sluggish interactive performance!

Make readahead copies occur in the background

The next trick, then, is to make the readahead occur in the background. By having a background kernel thread which waits until each page of data is read from the cache, and then handles copying it over into corresponding AFS page, the business of reading and copying data from the cache can be hidden from the user.

Conclusions

This set of changes actually makes a signifciant improvement to cache read speed. In simple tests where the contents of the cache are copied to /dev/null, the new cache manager is around 55% faster than the old one. Tests using Apache to serve data from AFS show significant (but slightly less dramatic, due to other overheads) performance improvements.

Sadly, the Linux Memory Management architecture means that we’re never going to obtain speeds equivalent to using the native filesystem directly. The architecture requires that a page of memory must be associated with a single filesystem. So, we end up reading a page from the disk cache, copying that page into the AFS page, and returning the AFS page to the user. Ideally, we’d be able to dispense with this copy and read directly into the AFS page by switching the page mappings once the read was complete. However, this isn’t currently an option, and the performance benefits obtained through the current approach are still significant.

March 27, 2009

AFS & Kerberos Best Practices Workshop

Filed under: Uncategorized — sxw @ 1:01 pm
Tags: , , , ,

Once again, I’m presenting at the AFS & Kerberos Best Practices Workshop. This years event is at Stanford University from June 1st-5th

I’m giving two talks, the first on prometheus, our new Identity Management System. The second is about how to contribute to OpenAFS. The abstracts are

Prometheus is an LDAP based provisioning system, which is designed to manage a wide variety of user databases, including AFS’s PTS and a Kerberos KDC. It is highly flexible in the databases it supports, and permits very fine grained delegation of control. It has a role-based access control model, and allows the creation and management of roles by any authorized user. It is instance aware, allowing users to create many instances of a primary account, request keytabs of those instances, and delegate particular permission sets to individual instances. Prometheus is designed to be a distributed as possible, permitting provisioning of system maintained by disparate groups without requiring those groups be trusted by the system itself. This talk will discuss the design goals behind Prometheus, provide an update on implementation progress, and demonstrate a running system.

and …

OpenAFS has a huge, daunting codebase, with a relatively opaque system of patch submission, review and application. It takes mountains of skill, and years of persistence to get your first patch into a state where it can be submitted, let alone accepted into the hallowed halls of the code tree…

Nonsense!

This talk will attempt to blow away some of the misconceptions with regards to contributing to OpenAFS. It will provide a first-timers view of the steps, both technical and political, to crafting a patch for submission into OpenAFS. We’ll take a whistle stop tour of the tools now involved in the process, from the code repository, to the patch review system and the bug tracker. We’ll talk about code review, bug triage and testing, with a view to inspiring participation in these areas.

Finally, we’ll talk about some low hanging fruit that anyone could get started on, and write their first bit of OpenAFS code …

In addition to keynotes from Morgan Stanley and Carnegie Mellon, the conference features a number of talks about research computing storage (including one from the nanoCmos project), and looks like it will have a great mixture of academic and commercial topics.

The hotel block (at the very reasonable Stanford Guest House) expires April 1st, with the early bird deadline being April 21st.

January 24, 2009

Using fstrace to debug the AFS Cache Manager

Filed under: Uncategorized — sxw @ 9:25 pm
Tags: , ,

Fstrace is an AFS utility which logs a huge amount of information about the internal operations of the AFS cache manager. It can provide details of the process flow through the AFS cache manager, without requiring any recompilation or specific debug options.

To use fstrace, you need a /usr/vice/etc/C/afszcm.cat file to be installed. Unfortunately, this file is not currently installed as part of RPM builds, and without it, the output from fstrace is pretty much useless. However, you should be able to take the afszcm.cat file from any build of the same AFS version and use it with your kernel module.

To start logging, initialize the logging system with

  • fstrace clear cm
  • fstrace setlog cmfx -buffers 100

and then, to start the logging session, and dump the log output to a file

  • fstrace sets cm -active
  • fstrace dump -follow cmfx -file /tmp/log -sleep 10 &

Then, perform the operations which you want to obtain a log for. You may find that the dump needs a short period of time (or another command to be invoked) in order to flush all of the data from the command that you’re logging onto disk.

To stop logging, run

  • fstrace sets cm -inactive

Update 27/03/09: As of OpenAFS 1.4.9, the afszcm.cat file will ship as part of the standard RPM installation

November 3, 2008

AFS Hackathon and Google Summer of Code

Filed under: Uncategorized — sxw @ 7:35 pm
Tags: , , , ,

I’m now back in Scotland, having spent the last week in California, courtesy of the very nice people at Google’s Open Source Program’s office and OpenAFS. As previously mentioned, I spent the summer mentoring a student (Dragos Tatulea) who was adding support for read-write disconnection to OpenAFS. The mentoring process was hugely rewarding – from a standing start Dragos learned a huge amount about a very complex codebase, and produced a workable implementation of disconnected operation which is now part of the OpenAFS tree. Whilst mentoring was both challenging and time consuming, it also encouraged me to rapidly learn about bits of the OpenAFS codebase I’d never delved into before, and lots about Linux kernel development that I’d been trying to avoid ever knowing!

So, Google invited OpenAFS to nominate people from their Summer of Code mentoring team to attend a summit at their Mountain View headquarters, and I was kindly included. Derrick, Jeff and Matt from OpenAFS also came along. My bags also eventually joined me!

The mentors summit its self was an eye-opening experience. Organised as an un-conference, where people were encouraged to arrange sessions on topics and technologies that interested them, there was a huge amount of fascinating information, and many useful relationships created and renewed. In particular, a chance demonstration at the session talking about Android introduced me to Gerrit, a web based code review tool. I firmly hope that gerrit will be part of the OpenAFS development process, just as soon as we get moved over to git.

Immediately following the Summer of Code mentor’s conference, Google hosted an AFS hackathon – a chance for a collection of OpenAFS developers to get together, discuss the current state of our world, and make targetted progress on specific items. Much of the discussion here centered upon moving forwards on a few specific areas – the move from CVS to git, the integration of rxk5 and Hartmut’s OSD work, and the ongoing work on forming an foundation, and creating a standardisation process. 

I also spent half a day looking at improving the AFS user experience on the Nokia n810. Unfortunately the Hildon file manager widget which both the n810 file browser and all native applications use has some features that make it particularly unfriendly for network file systems. Firstly, it does all of its processing in a single thread, so file system operations which block for a long time also hang the user interface of the application. Secondly, it’s not particularly aware of ‘expensive’ operations – for example, when you open a directory it will also open all of the sub directories, and work out how many files are in them by stating every file, in every sub directory. Needless to say the performance of this is very poor when the directory you are opening is /afs.

I also spent time on bringing up a test instance of gerrit, and working up some proposals of how this could be integrated into the OpenAFS patch workflow. Whilst this is still blocked on the work on the git migration (which Max and Mike made significant progress on over the 2 days), hopefully we’ll be in a position to start using it in anger soon.

Despite the best efforts of the fog at LAX, and American Airlines, I also made it back to Scotland!

February 26, 2008

FOSDEM 2008

Filed under: Uncategorized — sxw @ 2:45 pm
Tags: , , ,

Over the last weekend, I attended FOSDEM, an absolutely mind blowing conference bringing together Free and Open Source developers from all over Europe. The scale of the conference, attracting as it does thousands of developers, and accommodating hundreds of different talks over 2 manic days, really can’t be described. You have to be there to experience it.

I made the journey to Brussels by train, a most civilised way to travel – especially given that Eurostar are quite happy to replace lost return tickets for a small fee! The weekend started with the infamous beer event on the Friday night (hence the lost ticket), before getting down to business on the Saturday. It’s hard to pick particular highlights from such a packed program, but the perl6 talk managed to be both fascinating and scary at the same time and the cmake talk was very useful given the way Stephen is going with build tools. In the dev rooms, Dan Mosedale unfortunately didn’t make it for the Thunderbird talk, but an productive discussion was had none-the-less, and Jens Kuehnel’s introduction to SELinux in the Fedora devroom helped overcome a lot of my fears (and, in fact, has succeeded in its goal, as I no longer just switch it off). The sight of 100+ folk all participating in a PGP keysigning had to be seen to be believed (eventually, we just had to go outside, as the lecture theatre just wasn’t big enough)

I signed up a few months ago to present a Lightning Talk on OpenAFS, in an attempt to grow awareness, and attract new developers. That talk certainly helped me with talking to other people at the conference, as well as being pretty well received. Both slides, and video, of the talk are available from the FOSDEM site.

February 21, 2008

UKUUG Files and Backups Seminar

Filed under: Uncategorized — sxw @ 12:00 am
Tags: , ,

As previously trailed, I presented as part of this year’s UKUUG Files and Backups Seminar on the 19 Feb. My talk, on  OpenAFS, was a revised and extended version of the paper Craig and I wrote, and I presented at UKUUG’s Spring Conference the year before. Whilst that paper concentrated on Informatics’ experience in deploying OpenAFS, the seminar talk was far more outwards facing, discussings the pitfalls and benefits of any OpenAFS deployment across many different types of organisation. A copy of the slides is available from the UKUUG web site.

Both days of the seminar were a very interesting opportunity to take part in a number of focussed discussions about storage issues, as they affect a wide variety of different businesses. Charles Curran’s discussion of CERN’s data management issues (with LHC producing around 15PB of data every day) was a hilarious tour through the issues involved in managing vast amounts of experimental data, and Kern Sibbald’s talk on Bacula was a fascinating discussion of what must be the industry’s leading Open Source backup technology. Kern and I had a chat afterwards about the issues involved in making Bacula AFS aware, such that it could easily handle both backup, and restoration of files from AFS volume dumps.

February 15, 2008

Talking

Filed under: Uncategorized — sxw @ 6:14 pm
Tags: , , , , , ,

I’m giving a few talks over the next couple of months

  • UKUUG Files and Backup Seminar I’m giving a general overview of AFS from a users and administrators perspective, particularly focusing on features that will be of interest to new deployments
  • FOSDEM I’m giving a developers overview of OpenAFS as a lightning talk
  • UKUUG Spring Conference I’m currently scheduled to give two talks. The first is an overview of our monitoring system, talking in particular about the benefits (and challenges) of integrating it with LCFG. The second is about our in-development account management system, prometheus, and some of its unique features.
  • AFS & Kerberos Best Practices Workshop

I’m going to FOSDEM, the Free and Open Source Software Developers’ European Meeting

January 28, 2008

OpenAFS upgrades

Filed under: work — sxw @ 12:05 pm
Tags: , ,

This week, I’m going to be upgrading our OpenAFS database servers to 1.4.6, with my patch to disable the checks for principal names with dots in them (this patch will ship with OpenAFS 1.4.7). At the moment, iFriend users can’t register with the AFS pts database, because the email address naming scheme requires that their name contain dots. This means that there’s no way of using iFriend as an AFS authentication scheme, which was one of the original goals. 

Once all of the AFS database servers are suitably upgraded, it’ll be possible to register iFriend users, either through a CGI script, or with an extension to mod_waklog. Allowing them access to specific directories will require the fileserver hosting that volume to have also been upgraded, and correctly configured. 

Theme: Rubric.