From the better-late-than-never department
I spent the first weekend of February at FOSDEM 2010, a pretty much unique conference of free software developers held yearly in Brussels. Whilst I emailed these notes around internally upon my return, it’s taken a while to get around to putting them up on my blog. In that time, though, FOSDEM have published the videos they made of most of these talks. Links to those videos are included below.
FOSDEM is a completely free event, whose scale has to be seen to be believed – they have over 300 talks spread across the 2 days, and initial estimates suggested that there were approximately 10,000 delegates on Saturday (the wireless network had over 4000 active leases at one point). I split my time between attending main track sessions, listening to development room talks, and volunteering with conference support
Bullet points for the pointy haired …
- Mozilla don’t have any real interest in the enterprise space
- Facebook are doing some amazing things, and actually talking about it.
- Usable open source smartcard stacks are finally in today’s distributions.
- Hadoop, and the NoSQL movement in general, are gaining a large amount of mindshare. If we’re not getting requests for Hadoop clusters now, it’s probably just a matter of time.
- The pound is taking a serious beating.
In the corridors
I spoke to Chris Blizzard and Gervase Markham from Mozilla about our experiences with the move from Firefox 2->3. They think that the latest development versions will offer some improvement to the number of small disk writes, but admitted that the level of disk activity had never been a concern in benchmarking the application. Gervase had some helpful suggestions about things we could do to assist with this in the future.
There’s a growing interest in Europe around smartcards. This is primarily being driven by the fact that a number of European governments are adding smartcards to their existing national ID cards, and people are wanting to exploit the authentication properties of these. One group were demonstrating an out-of-the box Ubuntu system using a smartcard for authentication, and document signing. Fedora apparently offers the same user experience. There are also powerful Open Source CA products becoming available, which would be very useful if we decide to go down the smartcard route at any point.
Brooks Davis gave an interesting talk about moving a large company (Aerospace, in the US) towards open source development models. He had some interesting observations on the uptake of new tools in big organisations, and on the reluctance to share code between development teams. Some of his horror stories were staggering – teams whose idea of revision control is hosting all of their source on a shared filesystem, and using a physical whiteboard to indicate who was modifying which files. Needless to say, code loss was common. Aerospace are rolling out an internal system based around Trac to encourage the use of revision control and other good sofware development practices company wide.
Next, Richard Clayton gave a fascinating tour of ‘Evil on the Internet’. This was an eye-opening overview of the way in which criminals steal money over the internet, exploring the whole ecosystem, from those who phish for credentials, the mules they recruit to launder the money, and the many, many fake websites that they have to con you into participating. What was alarming was the realism of many of these sites – I’d become used to identifying phishing scams through poorly formed emails, and the plausibility of much of what he demonstrated was alarming.
I spent much of the afternoon in the XMPP devroom. They’re doing some very interesting work both on integrating XMPP with web applications, and in terms of moving voice and video forwards with Jingle. I popped out of that devroom to listen to a talk by Mitchell Bakerabout Mozilla’s mission, and their view of how they can contribute to ensuring an open internet. Inspiring stuff, but sadly she did confirm that Mozilla as a whole don’t have an interest in the enterprise space.
Due to lack of space (a common theme during the weekend, sadly) I couldn’t get into the Spacewalk talk. Spacewalk (http://www.redhat.com/spacewalk/ ) is the open source version of RedHat Network, and is designed to manage software updates across an enterprise wide deployment of Linux. There was a lot of buzz about this at FOSDEM, and it looks like something we should definitely investigate.
On Sunday, I spent a some time in the Mozilla devroom, in particular speaking to Ludo about current, and forthcoming, changes to Thunderbird. We discussed the current state of both GSSAPI and LDAP support (my patch enabling GSSAPI for LDAP in Thunderbird was part of the 3.0 release). I was hoping to listen to some of the NoSQL talks too, but sadly that room was overflowing everytime I tried. I spoke with Peter St Andre (XMPP Foundation, now with Cisco) about improving XMPP’s Kerberos support – in particular how we can push support for domain based names into the SASL software stack. I will be following that up with him and Alexey Melnikov after the conference. I also met with some other OpenAFS developers, for a brief chat about the state of the tree, and the move towards the 1.6 release.
On Sunday afternoon, I helped moderate the talks in the Janson auditorium, in the scalability conference track. Isabel Jost started with a fascinating talk about Apache Hadoop, which offers distributed petabyte scale data processing, using tools modelled on Google’s MapReduce paradigm. HDFS looks pretty much like an implementation of what’s publicly known about Google’s GFS, and Hadoop layers Map/Reduce on top of that. The talk provided a great overview, firstly of map reduce and its power, secondly of the flexibility of the Hadoop implementation, and finally an idea of the huge degrees to which it can scale.
The next talk on this track was from Facebook. They started with some fairly blinding statistics – 8 billion minutes every day are spent on Facebook and 2.5 billion photos are uploaded each month. They provided an overview of their whole infrastructure, highlighting the various projects that make it tick. Their development language is PHP but due to issues with its speed, and memory use, they’ve developed hipohp, a static analyzer and translator which converts PHP into optimised C++. For logging, they used to use syslog, but their log volumes (~25 terabytes per day) melted down all of the syslog servers that they tried, so they moved towards ‘Scribe’ – which is now used by both themselves and Twitter, and offers massively scalable log storage. To process these logs, and for other data analysis tasks, Facebook are big Hadoop users. They’ve built Hive, which puts an SQL like layer on top of Hadoop’s syntax, with the aim of encouraging ease of use, and internal adoption. Their Hadoop cluster currently uses more than 80,000 compute hours per day.
Facebook’s data store totals 160 billion photos and serves 1.2 million of them every second. Nobody’s NFS scales to this kind of data load – I/O bandwidth, rather than storage density, ends up being the limiting factor. They’ve built a new storage system called Haystack, which removes a lot of metadata information, and transfers data serving from 10 disk seeks per file served into a single operation. memcache is crucial to facebook’s interactive performance, but they have a bit of a love/hate relationship with it. They’ve done a fair bit of work on extending it, by adding features 64bit and multithreading support. Their permanent data storage is MySQL, which they like very much – simple, fast and very reliable. They keep their usage of it simple – they don’t do joins at the database layer, but combine datasets together in PHP.
One very interesting observation from Facebook was that most of their interesting projects happen as “hack projects” where a very small group of engineers work intensively on an exciting idea – Haystack, for example, was built by 3 people.
Finally, in the scalability track, there was a talk about Status.Net and identi.ca. Choice quote: “When web people talk about scalability – what they really mean is will it keep working” Status.Net is a twitter-like service which is available both as a hosted system (identi.ca) and as a locally installable server, designed for organisations who want to host their own. The talk dealt more with the overall architecture of Status.Net (formerly laconica), rather than detailing its scalability, but provided an interesting example of how various open source components could be stitched together to produce a compelling product, and a rallying cry for the importance of both libre web services, and the ownership of your own web content.
To round off a very long couple of days, Greg K-H gave a whistle stop tour of writing your first patch for the Linux kernel. This was a very well delivered overview of a lot of complex topics, and was aimed at inspiring members of the audience to contribute, rather than being an in depth description of the kernel. It was a very well chosen end to an inspiring weekend