Fuzzy Testing

October 13, 2008

I’ve been thinking about various aspects of the upcoming project to create a testing framework for LCFG components. One thing we really need is a replacement for what I am going to term “fuzzy matching”. With the current framework there is support for embedding tags [% and %] to mark sections which are going to change with every run (timestamps, version numbers, etc.)

This is a really nice idea and is very useful but there are a couple of downsides to the current implementation. In particular, the tags need to appear in the expected output and also the generated output (and consequently the code which is being tested). However, we do not want these tags to be appear in the generated output on a “live” system so we end up having to rebuild the packages in a special way to get the tags inserted into the code producing the output. This leads to a situation where the code we are testing is not identical to that on the live system, at best the difference is a few strings but in some cases completely different code paths are followed.

Having thought about this for a while I reckoned it should be possible to do away completely with the need to add tags to the generated output and just markup the expected output. Effectively the expected output becomes like a template. The inspiration for this approach came from Template::Extract and initially I thought I could just build directly on top of that module but I didn’t have too much success.

I’ve now come up with Test::FuzzyMatch, this is most definitely alpha-quality software but it is already quite useful. I think this demonstrates what I want to be able to do:

Here is part of a logfile from the boot component:

06/10/08 03:02:01: >> run
07/10/08 03:02:02: >> run
08/10/08 03:02:01: >> run
09/10/08 03:02:01: >> run
10/10/08 03:02:01: >> run
11/10/08 03:02:01: >> run
12/10/08 03:02:02: >> run

Here is the template it needs to match:

[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run

Any regular expression can be embedded inside the tags, anything which is not inside the tags is a simple text-string. For each line a regular expression is assembled from the parts (with the text-strings being first passed through the quotemeta function. Each line in the input file is then compared with the regular expression generated from the same line in the template.

It can be used something like:

use Test::More tests => 1;
use Test::FuzzyMatch;

is_fuzzy_equal_files( 't/boot.tmpl', 't/boot.log' ,
                                  'checking that the log file is correctly formatted');

This could be achieved by just writing every line in the expected output as a regular expression. I think this is clearer in terms of both reading and writing. It also means that we could optimise for lines where no fuzzy-matching is required.

I’d like to take this idea a bit further and add support for simple loops to handle repetition. The boot logfile example above shows how it would be nice to say something like “the next 7 lines must match this”.

Testing, testing…

February 19, 2008

I’ve been adding tests to one of the Perl modules I have written for the buildtools project. This is for two reasons, firstly, it’s good to have tests for the code as it helps spot bugs and makes it easier to add further functionality safe in the knowledge that none of the earlier work has been broken. Secondly, I wanted to try out a few testing strategies which might be useful when it comes to the LCFG core refactoring project. I have been playing with Devel::Cover which measures how much of the code is covered by the test suite – here’s some results. My initial trials suggest this is quite a good strategy as it encouraged me to think about all the different pathways through the code and it did find one rather nasty bug.