/ A/B testing

How page load times are affecting your A/B tests

Back in 2014, a team made up of employees from Microsoft, LinkedIn and SW Jiaotong University wrote a paper presenting seven rules of thumb they had extracted from their experiences conducting thousands of online experiments. These rules of thumb are solid gold for anybody who is involved with A/B testing or generally experimenting with a website, and I strongly recommend you read Adrian Colyer’s write-up of the paper published earlier this week.

I’m not going to talk about all seven rules here - my intention is just to dig into one specific rule of thumb: speed matters A LOT.

The paper pulls no punches when describing how influential page load speed is on your key metrics like conversion rate, bounce rate and revenue per user:

How important is performance? Critical. At Amazon, 100msec slowdown decreased sales by 1%

For Amazon, a 1% decrease in sales is millions of dollars in lost revenue. It’s chilling to think about how much revenue we’ve lost on our sites thanks to seemly inconsequential slowdowns.

But what does page load speed or, to use a more generic term, ‘performance’, have to do with online experiments? Two important things: experiments can identify the impact of performance for your site - and misunderstanding and mismanaging performance could well be skewing the results of your experiments.

Measuring the impact of performance

We know that performance has a big impact on key metrics, but it’s still critical that we all quantify that impact as accurately as possible. For Amazon, it was a 1% reduction in sales, but for other, slower sites, the impact is likely to be even more significant - and damaging.

Thankfully, it’s reasonably simple to actually quantify what performance means for your site: you can A/B test it. The bing.com team at Microsoft performed an A/B test where they artificially slowed page load times for 10% of their audience by 100ms, and for a different 10% segment by 250ms. They ran this test for just two weeks. The results from this test are compelling: each 100ms speedup resulted in an extra 0.6% increase in revenue. After translating this into actual profit, the team at Microsoft remarked that “an engineer that improves server performance by 10ms (that’s 1/30 of the speed that our eyes blink) more than pays for his fully-loaded annual costs”.

Here at SKIPJAQ, we routinely optimise customer page load times by hundreds of milliseconds, sometimes even seconds. With such massive gains, the return on investment is clear for anybody who makes money online.

The dangers of skewed experiments

One of the key insights from the ‘Seven rules of thumb…’ paper is how much performance affects your ability to run experiments even when they are in no way directly related to page load speed. To quote the paper:

Web site developers that evaluate features using controlled experiments quickly realize web site performance, or speed, is critical. Even a slight delay to the page performance may impact key metrics in the treatment.

In essence, if we can’t ensure that page load speeds for both the control and treatment groups in our A/B tests are the same, then the results of that test are called into question.

Performance has such a large impact on key metrics, that it can easily overshadow the difference in effect between your treatment and your control. How does this manifest in the real world? Well, you might see one of two things happen. If page load times in the treatment are worse than in the control (perhaps you pushed out an inefficient version of the treatment for testing), then you might fail to detect a real improvement that could have come from the treatment.

Conversely, if the treatment is much faster than the control (maybe you’ve circumvented some tracking for the sake of expediency), then that treatment might appear better than it really is. Of course, if the treatment is implemented exactly the way you want it, then you’ve likely stumbled across a happy performance optimisation!

In both cases, if you’ve quantified the impact of performance for your own site, you’ll be able to control for that in your experiments. It’s critical that you’re able to measure page load times for both the treatment and the control. That way, you’ll be able to factor in the impact of performance when analysing the results of your experiment.

tl;dr

Milliseconds matter. Every extra millisecond that is added to your page load times is losing you revenue. A hundred milliseconds of difference in page load time between the treatment and the control in an experiment will likely skew the results in a way that is hard to detect - unless you fully understand the impact performance has on your key metrics.

If you only do one thing after reading this post, I urge you to test your own site to see how big a deal performance is for you. Once you know that, please do get in contact if you want to see how SKIPJAQ can shave a few hundred milliseconds off your page load times.