We Could Write Nearly Perfect Software but We Choose Not to

The Perfect Software

This blog post was inspired by an article They Write the Right Stuff by Charles Fishman published  in the Dec 1996/Jan 1997 issue of Fast Company magazine. The article describes the software development process used by on–board Shuttle group that writes software for NASA to be run on control computers inside the Space Shuttle.

The publication quotes some impressive numbers.

Consider these stats: the last three versions of the program — each 420,000 lines long — had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors.

The software is said to have been delivered on time and within budget. If this cannot be called nearly perfect software, I do not know what can.

In this post I will try to argue that these statistics are not something out of the ordinary. Every software engineering project could boast similar numbers. It is just that achieving them comes at a price (both literally and figuratively) which we find to be too hight to pay.

They Write the Right Stuff breaks down on–board Shuttle group‘s process into 4 principles. I will follow the same structure and will explain why — in my opinion — we choose not to uphold them.

1. Big Design Up Front

At the on-board shuttle group, about one-third of the process of writing software happens before anyone writes a line of code. […] Take the upgrade of the software to permit the shuttle to navigate with Global Positioning Satellites, a change that involves just 1.5% of the program, or 6,366 lines of code. The specs for that one change run 2,500 pages […]

Here are the reasons I believe to dissuade some programming companies from specifying their software in detail before any programming begins.

First, designing up front is hard. This process requires experience and a deep understanding of the problem that the software is going to solve. It is hard not only from the purely technical point of view, but psychological as well. I believe that it is fair to liken the process of requirement gathering to mind reading as clients will often find it difficult to articulate their needs clearly. Often it seems easier to start programming and solve any issues that arise once they become apparent. I do not find it surprising that such a patchwork of a project is more likely to contain errors and bugs than one that was carefully designed beforehand.

Second, the client is always right. In this day and age it might be difficult to convince your clients that they absolutely must go to numerous lengthy meetings and fund requirement gathering and documentation process when, in their mind, it will have no tangible benefits. While the civil engineering analogy is growing old, I still believe that it holds true. One would not expect the clients to change their mind about how many rooms they want after the foundations have been laid. Why should it not be true for software development?

Extreme Programming proponents argue that requirements always change. Of course, part of it is that times change and requirements change with them. Obviously, in those situations designing up front is not the right thing to do. However, I believe that more often than not it is a misunderstanding between the client and the developers that causes the changes. It that case, it was a choice not to invest more time in requirement gathering, and thus — I argue — a choice not to implement the best software possible.

Big design up front is obviously much easier for the Shuttle group because they, as the article points out, have a single client with a single project and they have a deep understanding of the problems at hand.

2. Code Review Performed by a Separate Team

The central group breaks down into two key teams: the coders — the people who sit and write code — and the verifiers — the people who try to find flaws in the code. The two outfits report to separate bosses and function under opposing marching orders.

With pair programming on the rise, similar techniques might be applied more widely in the future. However, I still think that pair programming is not quite on par with having a separate team dedicated purely to code review. I think that purely because of my personal experience that one develops a kind of bug-blindness when they are familiar with the code. This is exactly why having a separate team is so powerful — when you are not familiar with code it is easier to spot mistakes.

3. Every Change Is Documented

One is the history of the code itself — with every line annotated, showing every time it was changed, why it was changed, when it was changed, what the purpose of the change was, what specifications documents detail the change.

Times seem to have changed the scene for the better. What was described as something to be marvelled at in 1996 now is commonplace with advances in version control software. With Git and Maven available freely it is purely a question of discipline to match 90’s NASA in record keeping.

This is the only point laid down by  Fishman that is a complete no-brainer — there are no drawbacks. Keeping good records does not require any extra time, money or human resources.

4. Learning From Past Mistakes

Like computer models predicting the weather, the coding models predict how many errors the group should make in writing each new version of the software. True to form, if the coders and testers find too few errors, everyone works the process until reality and the predictions match.

The process is so pervasive, it gets the blame for any error — if there is a flaw in the software, there must be something wrong with the way its being written, something that can be corrected. Any error not found at the planning stage has slipped through at least some checks. Why? Is there something wrong with the inspection process? Does a question need to be added to a checklist?

In a fast paced  environment it might be difficult to find time to reflect on every single bug and question yourself whether there was anything that could have been done in order to prevent it. Are there any other places in the codebase that could suffer from the same bug?

Conclusions

All of these techniques seem to boil down to good documentation and not sparing time or resources when it comes to making sure that the software is top notch. The simple truth is that bug–free on time software is just more expensive than we (or our clients) are prepared to pay.

I hope that I managed to persuade you that it is not that we cannot write nearly perfect software, it is just that — like with everything in life — we make trade-offs.

A/B Testing: More than Just Sandpaper

In interface design A/B testing is a simple experiment in which randomized groups of users are presented with variations of the same interface and their behaviour is observed to better inform design decisions. This kind of testing is usually used to improve website conversions. That is because the World Wide Web is a media that is uniquely suited for A/B testing – it is comparatively inexpensive to present different users with modified versions of the same website and track their actions. It would not be feasible in the context of traditional media.

With success stories that report 50% increases in clicks by altering phrasing of the link or even helping to win presidential elections a great number of A/B testing services and tools (Google Website Optimizer, Amazon’s A/B Testing and Vanity to name just a few) has emerged. Not to mention the countless web posts.

In 2011 the company [Google] ran more than 7,000 A/B tests on its search algorithm. Amazon.com, Netflix, and eBay are also A/B addicts, constantly testing potential site changes on live (and unsuspecting) users.

— Brian Christian at Wired.com

However it is not the apparent ubiquity of A/B testing or success stories but a particular criticism of split testing that inspired me to write this article. In his blog post entitled Groundhog Day, or, the Problem with A/B Testing Jeff Atwood argues that A/B testing has no feeling, no empathy and only produces websites that are goal driven and can never win hearts and minds. Mr. Atwood quotes his friend’s tweet:

A/B testing is like sandpaper. You can use it to smooth out details, but you can’t actually create anything with it.

Nathan Bowers

I believe this to be a wrong way of looking at it. Obviously, A/B testing in itself cannot produce anything but it can guide the design process and quantify how good the final result is. It is difficult enough to avoid developer’s blindness and to work with the fact, that people do not know what they want, in mind. But in this day and age one also has to navigate the perils of multiculturalism. When developing a website that will potentially be accessed from all around the world, a developer or designer cannot possibly be expected to simply conjure the perfect solution out of thin air.

While Mr. Atwood seems to think of A/B testing purely as a way of monetising, I tend to side with some of the people who have commented on his blog post and think that testing democratises the process of software development and brings better outcomes for both the developers and the users.

I believe these same people cannot read the minds of every single person who visits their web site, or uses their app. Therefore, I think it’s great that these people can test both their ideas, rather than having to make some evidence-free guess and rationalize it after the fact. An A/B test is only as good as your best idea, after all. Ideas still matter!

Lukestevens

This is not to say that good designers are unimportant but they cannot always predict what will attract the users to interact with their designs. A/B testing has shown time and time again that in some cases the solutions that violate the rules of visual composition or could even be perceived as vulgar are the most appealing.

To try and boost donations the digital team attempted to improve the design by making them look “prettier”.

That failed, so in response an “ugly” design was tested to see if that made any difference. This involved using yellow highlighting to draw attention to certain text within the email.

To the team’s surprise the ugly design actually proved to be quite effective, though the yellow highlighting had to be used sparingly as the novelty wore off after time.

David Moth

I can see where Jeff Atwood is coming form. It might seem that such scientifically rigorous tests subtract from the artistic, inspired or simply human qualities of design. Or that corporate values might suffer in the face of corporate greed. However, I am a firm believer that benefits of A/B testing far outweigh the risks. With the later being non existent when testing is treated as the irreplaceable source of insight it is rather than the deciding voice.

I might be going a step too far here but — honestly — the existence of A/B testing makes me hopeful that elegant solutions for other difficult software development problems (e.g. project timeline prediction) might be within our reach as well.