SQL, ipython and pandas

March 30, 2013

I recently came across a really handy module which makes it easy to access data stored in an SQL DB from the ipython shell. Turns out that then going the next step and moving the data into pandas is very easy. All very cool, I love how easy it is to hack out code for quick data inspection and manipulation using ipython.


Refactoring the LCFG client

March 29, 2013

The time has come to start work on refactoring the code base of the LCFG client. This has been overdue for a while now as the current state of the code is preventing us from doing much in the way of new developments for fear of breaking something important. The aim is to tidy and clean the code to bring it up to modern Perl standards and generally make it much more readable and maintainable. The aim is to avoid altering the functionality if at all possible although a number of small bug fixes will be tackled if time allows. The full project plan is available for reading on the devproj site. This project incorporates many of the lessons we learnt when we refactored the LCFG server code last year, again see the devproj site for details.

I made an initial start on the project today. As with all refactoring the best first move is to ensure you have some tests in place. In this case I just added simple compilation tests for each of the 5 Perl modules involved. Interestingly this immediately flagged up a genuine bug which existed in the code, this was related to the use of subroutine prototypes. Now anyone who has a reasonable amount of experience with Perl programming will tell you that subroutine protoypes are evil, full of gotchas and rarely do what you expect. One of the tasks in the plan is to rid the code of them entirely but that’s not for today. Thankfully this was a simple error where the prototype stated that 4 scalars were required when, in actual fact, only 3 were needed (and only 3 were provided when the subroutine was called). I’m surprised the code actually worked at all with that bug, this shows how useful even simple testing can be for improving code quality.

The whole code base is basically 5 Perl modules and a script which uses them all. An interesting strategy was taken with the module loading, all subroutines from the modules were imported into the “main” namespace of the script (which is effectively global) and then all calls to them anywhere in the code base were referred to the version in that namespace. So, all subroutine calls were done with unqualified, short names, I guess this makes it quick to hack out but coming at the code without a huge amount of prior knowledge it is almost impossible to quickly reckon the source location for each subroutine. So, my second step was to work through all the code and replace the calls with fully-qualified names. To make it doubly clear that the old way wasn’t readable or maintainable I also ripped out (the now unnecessary) support for exporting subroutines into another namespace and ensured that when these modules are loaded there is no attempt to import anything.

This sort of change should be zero impact, right? Turns out, not entirely, nothing is ever simple… I had to shuffle a few subroutines out of the script into the helper modules, in turn that meant fixing a few references to global variables. This in turn required passing another parameter to a couple of subroutines which meant hacking out a few evil subroutine prototypes. I think that shows up a few code smells which will have to be tackled very soon.

Before I can really get stuck in though a few more tests are going to be necessary. At the very least there is going to have to be a test of the client’s ability to download an XML profile and convert it into the locally stored file format. At this stage I don’t know enough about the code to create tests for each subroutine so a large-scale test of functionality is the only option. Without that test it won’t be safe to make any bigger code changes.


SSH honeypots

March 27, 2013

I’ve never been brave enough to run an SSH honeypot myself. Anything which is even pretending to be “open” to the world to attract bad guys is probably just too much of a risk for a network. Having said that, it’s clear there is a lot of interesting data which could be gathered and a lot we could learn about the standard approaches to system compromise attempts. I recently came across a fascinating blog article which reviews the data captured using a honeypot. It gives some insight into how these attacks are carried out and clearly shows that most of them are “script kiddies” without much clue. As my recent talk (Do bad guys work weekends?) presented at the FLOSS UK Spring Conference in Newcastle showed, there are some very simple strategies for completely blocking most of these attacks.


Using Python and Pandas to process data

March 16, 2013

I’ve recently been doing some data analysis for a presentation I will be giving at the FLOSS UK Spring Conference in Newcastle next week. This involved processing a lot of data gathered from our syslogs related to SSH authentications. As part of my ongoing effort to learn Python properly I decided to do all the work in that language. Whilst hunting around for useful modules for processing data and calculating various statistics I came across the very clever Pandas library which provides some impressive tools for processing tabulated data (such as that in CSV style files). It’s a bit of a steep learning curve but I’ve just come across a neat blog article which summarises the main functionality quite well. I’ve only used a few of the features so far, I particularly found the groupby functionality very handy, I shall definitely be exploring this library further in the future.