Fuzzy Testing

October 13, 2008

I’ve been thinking about various aspects of the upcoming project to create a testing framework for LCFG components. One thing we really need is a replacement for what I am going to term “fuzzy matching”. With the current framework there is support for embedding tags [% and %] to mark sections which are going to change with every run (timestamps, version numbers, etc.)

This is a really nice idea and is very useful but there are a couple of downsides to the current implementation. In particular, the tags need to appear in the expected output and also the generated output (and consequently the code which is being tested). However, we do not want these tags to be appear in the generated output on a “live” system so we end up having to rebuild the packages in a special way to get the tags inserted into the code producing the output. This leads to a situation where the code we are testing is not identical to that on the live system, at best the difference is a few strings but in some cases completely different code paths are followed.

Having thought about this for a while I reckoned it should be possible to do away completely with the need to add tags to the generated output and just markup the expected output. Effectively the expected output becomes like a template. The inspiration for this approach came from Template::Extract and initially I thought I could just build directly on top of that module but I didn’t have too much success.

I’ve now come up with Test::FuzzyMatch, this is most definitely alpha-quality software but it is already quite useful. I think this demonstrates what I want to be able to do:

Here is part of a logfile from the boot component:

06/10/08 03:02:01: >> run
07/10/08 03:02:02: >> run
08/10/08 03:02:01: >> run
09/10/08 03:02:01: >> run
10/10/08 03:02:01: >> run
11/10/08 03:02:01: >> run
12/10/08 03:02:02: >> run

Here is the template it needs to match:

[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run
[% \d{2} %]/[% \d{2} %]/[% \d{2} %] [% \d{2} %]:[% \d{2} %]:[% \d{2} %]: >> run

Any regular expression can be embedded inside the tags, anything which is not inside the tags is a simple text-string. For each line a regular expression is assembled from the parts (with the text-strings being first passed through the quotemeta function. Each line in the input file is then compared with the regular expression generated from the same line in the template.

It can be used something like:

use Test::More tests => 1;
use Test::FuzzyMatch;

is_fuzzy_equal_files( 't/boot.tmpl', 't/boot.log' ,
                                  'checking that the log file is correctly formatted');

This could be achieved by just writing every line in the expected output as a regular expression. I think this is clearer in terms of both reading and writing. It also means that we could optimise for lines where no fuzzy-matching is required.

I’d like to take this idea a bit further and add support for simple loops to handle repetition. The boot logfile example above shows how it would be nice to say something like “the next 7 lines must match this”.


Moving to the new build tools just got easier

October 8, 2008

I’ve been working on porting the MPU components to the new LCFG build tools. This is giving me a good idea of how successful the tools are at handling a wide variety of situations. So far everything seems to be going pretty well, I’ve certainly not hit any major stumbling blocks which would suggest the need for major changes.

As I’ve gone through I’ve taken the chance to replace any deprecated LCFG macro names with their modern equivalents. Essentially this is the list of aliases in the table in the "Package Information" section of the "Substitution Variables" page in the build tools documentation. Mainly they are deprecated because they might clash with standard CMake variable names. I started off doing this search-and-replace process by hand but I rapidly got rather bored with that approach. This is the sort of job at which Perl excels so I’ve now come up with a tool to do the job automatically.

The LCFG release tool checkmacros command now has a --fix_deprecated option to carry out this automatic replacement. It scans through all the files as usual finding all the instances of various macro usage and produces the report. After that point it can use the results to modify any files containing deprecated macros.

If you are feeling really brave you could also try out a new version of updaterpms which uses the lcfgutils shared library, you will need:

!profile.packages  mEXTRA(+updaterpms-3.1.5-1)

Learning Perl

October 7, 2008

I was going to post this as a comment to Chris’s recent post but it started to get a bit long and involved so I reckoned it would be better as a separate entry on my blog.

The Programming Perl book is basically considered to be the only complete specification of what Perl is and how it works so there is a lot of value in it going into detail for every clever and odd edge case. As Chris said though, it’s definitely not a useful book for learning how to programme in Perl.

In my opinion, the best route is Learning Perl, Intermediate Perl and Mastering Perl, the three books are designed to be thorough introduction to Perl programming. The Perl Cookbook is excellent and invaluable for solving those small problems which others have previously encountered. It is sadly getting rather old now, the most recent edition was published in 2003 which is rather a long time in terms of Perl and currently accepted best practices. The main issue I have with it is that I find it a bit lacking in general advice on how to proceed with some problem which doesn’t fit directly into any of their categories. A more minor issue is that Perl people love the “There’s More Than One Way To Do It” attitude so the way one person solves something may not appeal to someone else.

I’ve recently found the Perl Best Practices (and the related perlcritic tool) to be very useful and I recommend it to anyone who regularly writes a lot of complex Perl code. I believe it has helped improve the quality, readability and maintainability (some would say those 3 things are all the same really) of my code enormously over the last year or so.