Python Profiling

Profiling isn’t often important for my python programs, but when a performance black hole appears out of nowhere it’s very useful for narrowing down the problem, even on smaller scripts.

(Recently I had cause to profile a script manipulating large-ish (<50Mb) files, whose performance had taken a turn for the worse. I had a suspicion that poor list manipulation was to blame, but the results spoke for themselves: 99% of CPU time was spent in the function list.pop(0)! Based on this I discovered PEP-290 which describes the collections.deque structure, significantly more efficient than a basic list — the replacement function, deque.popleft(), barely registers a percentage of overall execution time(!)

I could have discovered this using basic profiling, but whilst playing with the profiler I discovered a very straightforward, easy-to-read python profile visualiser, and thought it worth sharing. It doesn’t do anything very beautiful or complicated, but it’s very quick, provides a neat “squaremap” view of your code performance, and importantly Just Works. RunSnakeRun is its name.

You can obtain RunSnakeRun and its dependencies using the awkwardly-named easy_install:

easy_install SquareMap
easy_install RunSnakeRun

Well. That was easy.

A note about running on the Mac — this requires TCL/Tk so on a standard Snow Leopard installation can’t be executed as the 64-bit Python binary is invoked. You can get around this by running in 32-bit – a useful general tip for any situation where you need to enforce an architecture:

$ arch -i386 python runsnake ./my.profile

So far so good, but you’ve no profiling data.  A brief profiling-in-python primer follows, though really it’s not complicated and this section is little more than a condensed version of the python docs; you should look there in preference!  In summary, however, you can profile any python app using the cProfile module; any interactive script can be run against  cProfile.py (you’ll find it in your Python library directory).  Running cProfile.py <your-script-name> <your-script-args> without an output file will produce some mostly-helpful statistics, but really nothing beats a graphical view so we’ll press on.

You can create some profiling data using the cProfile.py script as follows:

cProfile.py -o myscript1.profile <your-script-name> <your-script-args>

or programmatically:

import cProfile, mymodule
cmd = "mymodule.main(args)"
outfile = "myscript1.profile"
cProfile.run(cmd, outfile)

having run either of these you simply need to launch your visualiser:

$ runsnake myscript1.profile

Beautiful.

Leave a Reply

Your email address will not be published. Required fields are marked *