A/B Testing – Use with care

In the blog post A/B Testing: More than Just Sandpaper the author suggests that A/B testing is one of the key ways to guide the design of products. It sets out to dispute the claim that Jeff Atwood makes that A/B testing is like sandpaper. You can use it to smooth out details, but you can’t actually create anything with it. While A/B testing is a useful tool I think the author overstates its usefulness.

Local minima

The author says that:

[A/B testing can] quantify how good the final result is

This assertion isn’t the case. It can only quantify how good it is based on some metric against another version of the product.

This makes it possible to get stuck in local minima. Using A/B testing to iteratively improve may make it impossible to escape from the local minima as larger non iterative changes may be needed. Of course there is no reason why it isn’t possible to test distinctly different sites against each other using A/B testing. However developing this different design this would require a UX designer, user testing etc. Not just A/B testing.

As it only tests one design against another, it often isn’t able to address fundamental problems with developer perception. A key example of this would be eBay. From the Wired article that the blog post linked to eBay is also a company heavily into A/B testing. One of their notable failures was to expand into China. Why? Their entire site was designed from a western perspective and so failed to take into account the extent of the Chinese culture of haggling. They were bound by their preconceptions about how the site should have been. In not having a deeper understanding of the market they allowed TaoBao to capitalize on eBay’s mistake.

No amount of A/B testing allowed them to discover their error in time as A/B couldn’t quantify how bad their design was as they didn’t escape from the local minima and come up with a better design to test against.

Users aren’t the focus

The article proposes that:

[A/B] testing democratises the process of software development and brings better outcomes for both the developers and the users

I feel that this isn’t adequately explored, and isn’t true, so I offer some counter points

Skyscanner is a site that has a vested interest in getting users to click through to airlines sites. Through A/B testing they found that green buttons helped and many other minor improvements. As they continued to iterate their design they realised that in degrading the site experience they were causing their primary target to increase but causing fewer people to return to the site. There was a worry that if they kept following the path A/B testing suggested then it would hurt their bottom line long term.

On a more personal level this can be seen with the Wikipedia Ads. The heavily A/B tested Wikipeida fundraising banner was highly noticeable and hence highly distracting and was one of the few times I have manually added rules to AdBlock to remove an ad. Having the highly noticeable banner was clearly useful to Wikipedia but for me it was mostly just annoying having my eye drawn to it every time I used Wikipedia.

These examples admittedly are just anecdotes and lack any firm basis in studies but hopefully present the idea that without due care A/B testing can be taken too far. Fixation on maximising a few simple variables can be at the expense of other unmeasured areas of the design can be both detrimental to the users and the overall goals of the company.

Better designs?

The blog post says that:

A/B testing has shown time and time again that in some cases the solutions that violate the rules of visual composition or could even be perceived as vulgar are the most appealing.

I am unsure that this conclusion can adequately be drawn from the linked articles. While it is true to say that they increase some metric, I perceive it as a large to step to say that improving the metric means the designs are visually appealing. Putting yellow highlighting on the text in an email looks horrific from a visual standpoint, it does however draw the users attention to what the designer wants them to see.

Just another tool…

I agree with the premise of the blog post that A/B testing is a good tool, but it should be viewed as just that, another tool. It isn’t a panacea for deciding what the design should look like but an aid in guiding UI design decisions. It shouldn’t replace other design measures such as user testing which provide more feedback as to *why* part of a design works.

I understand where Jeff Atwood is coming from. A/B testing can be used to create, to guide, but it is one tool among many. But without other tools it is as he said, just sandpaper.

4 thoughts on “A/B Testing – Use with care”

  1. I would argue that A/B testing could have told Skyscanner that fewer customers are coming back. It is just the matter of looking at the right statistics.

    When done properly, A/B testing tracks users over time and direct conversions is just one of the metrics used to decide which of the designs is better.

  2. My point about A/B testing democratizing the web was supposed to cover the cases like Ebay. I was trying to convey the point that split testing is likely the only way that could have told Ebay designers what worked because their preconceived notions did not hold true in China.

    Also, after reading Why Eday Failed in China I believe that the issues ran deeper than the design of the website. Chinese want to make personal connections with the seller which constitutes a change in business model rather than in design.

  3. By “appealing” I did not mean visually pleasing design. I think we can both agree that text with bright yellow background is vulgar. My point is that not all people look at visual design the same way. Most of them do not care about beautiful fonts or balanced designs. Conveying the important information quickly sometimes is more important.

Comments are closed.