Using Python and Pandas to process data

March 16, 2013

I’ve recently been doing some data analysis for a presentation I will be giving at the FLOSS UK Spring Conference in Newcastle next week. This involved processing a lot of data gathered from our syslogs related to SSH authentications. As part of my ongoing effort to learn Python properly I decided to do all the work in that language. Whilst hunting around for useful modules for processing data and calculating various statistics I came across the very clever Pandas library which provides some impressive tools for processing tabulated data (such as that in CSV style files). It’s a bit of a steep learning curve but I’ve just come across a neat blog article which summarises the main functionality quite well. I’ve only used a few of the features so far, I particularly found the groupby functionality very handy, I shall definitely be exploring this library further in the future.