During a course of a project here it became apparent that the Linux OpenAFS cache manager is slow when performing reads from the local disk. In this case, all of the data is already on the local disk, and the cache manager knows that the data is up to date. Naively, you would imagine that reading this data would take roughly the same time as if you were reading directly from the cache filesystem. However, that is not the case – in fact, reads appear to be more than twice as slow when fetched through the AFS cache manager, as compared to fetching the equivalent files from the local disk.
I’ve implemented modifications to the cache manager which attempt to reduce this speed deficit. These modifications can be broadly split into 5 sections
Remove crref() calls
Pretty much every call into the OpenAFS VFS does a crref(), to get a reference to the users current credentials, despite the fact that this information isn’t always required. crref is relatively expensive – it acquires a number of locks in order to perform its copies, and can be a cause of unnecessary serialisation. By only calling crref when required we can gain a small, but measurable, performance increase
Reduce the code path leading to a cache hit
In readpages, we perform a lot of setup operations before we discover whether the data we’re interested in is cached or not. By making the cached case the fast path, we can gain a performance increase for cache hits, without causing a noticable degradation for cache misses.
Remove abstraction layers, and use native interfaces
The code currently uses operating system independent abstraction layers to perform the reads from the disk cache. These don’t know anything about the way in which Linux organises its virtual memory, and do a significant amount of extra, unnecessary work. For example, we use the ‘read’ system call to read in the data, rather than the significantly faster readpages(). As we’re being invoked through the AFS module’s readpages() entry point, we can guarantee that we’re going to be fetching a page off disk. Read() also gets called from a user, rather than kernel, memory context, adding to the overhead.
The Linux Cache Manager currently has no support for readpages(), instead requiring the VFS layer request each page independently with readpage(). This not only means that we can’t take advantage of cache locality, it also means that we have no support for readahead. Doing readahead is important, because it means that we can get data from the disk into the page cache whilst the application is performing other tasks. It can dramatically increase our throughput, particularly where we are serving data out to other clients, or copying it to other locations. Implementing readpages() on its own gives a small speed improvement, although blocking the client until the readpages completes kind of defeats the point, and leads to sluggish interactive performance!
Make readahead copies occur in the background
The next trick, then, is to make the readahead occur in the background. By having a background kernel thread which waits until each page of data is read from the cache, and then handles copying it over into corresponding AFS page, the business of reading and copying data from the cache can be hidden from the user.
This set of changes actually makes a signifciant improvement to cache read speed. In simple tests where the contents of the cache are copied to /dev/null, the new cache manager is around 55% faster than the old one. Tests using Apache to serve data from AFS show significant (but slightly less dramatic, due to other overheads) performance improvements.
Sadly, the Linux Memory Management architecture means that we’re never going to obtain speeds equivalent to using the native filesystem directly. The architecture requires that a page of memory must be associated with a single filesystem. So, we end up reading a page from the disk cache, copying that page into the AFS page, and returning the AFS page to the user. Ideally, we’d be able to dispense with this copy and read directly into the AFS page by switching the page mappings once the read was complete. However, this isn’t currently an option, and the performance benefits obtained through the current approach are still significant.