6 Accidental Ways to Ruin Your Split Tests and Limit Your Profit

split tests

You’re one of those people, aren’t you?

The kind who stay up late at night, thinking of ways to improve their businesses.

It’s great to have this much passion.

By now, you’ve probably at least read the basics about split testing. And while the general concept is very simple, it’s fairly difficult to run a conversion rate optimization campaign properly.

There’s a reason why conversion experts get paid a ton.

Too often, marketers and business owners run a split test, make a change based on the results, and find out that it either made no difference or produced a negative result.

That being said, you can definitely run split tests on your own and get great results.

However, you need to make sure that you set up your test correctly and that you know how to interpret the results.

I know they sound like simple things, but most split testers make several mistakes that lead to subpar results.

In this post, I’m going to show you 6 common ways to do split tests wrong. You may not make all these mistakes, but I bet you’ve made at least one at some point.

After you understand these, you’ll be able to run split tests with more confidence and steadily improve your conversion rates. 

1. Not segmenting your traffic

When it comes to split tests, the more control you have, the better.

Whenever you do any sort of analysis on your website, whether for split tests or not, you need to understand and consider segments.

A segment refers to any section of your traffic.

The most common segments are:

  • Returning visitors
  • New visitors
  • Visitors divided by countries
  • Visitors divided by traffic sources
  • Different days
  • Different devices
  • Different browsers

Here is data broken down by the days of the week:

image03The important thing to know about segments is that you cannot compare visitors from two different segments.

For example, if you have 100 visitors who use Chrome and 100 visitors who use Internet Explorer, there’s a good chance that they will act differently.

So, imagine if you sent the majority of Chrome users to one landing page and the majority of the IE users to a different landing page.

Can you fairly compare the results? No way. You might arrive at the opposite conclusion if you reverse the test.

The practicality of segmenting: Ideally, all your visitors sent to each version of your test would be identical. In reality, that’s impossible.

Having a large enough random sample size (more on that later) will mitigate a lot of your issues, but you will likely end up judging tests that aren’t perfectly segmented.

The reason for this is because you can’t perfectly segment a test in the vast majority of situations.

If you did, you’d be left with barely any traffic. For example, how many users would tick all these boxes?

  • Use your website on Monday
  • Use Chrome
  • Use a tablet
  • Live in the U.S.
  • Come from organic search traffic

Even if your site has a lot of traffic, you won’t have many users matching all the parameters—certainly not a big enough sample to run a test on.

The solution is compromise.

Right now, I want you to identify the segments that have the biggest influence on your results. For most sites, that will come down to the traffic source and, possibly, the location.

When you’re analyzing your split test results, make sure you’re analyzing the results only for users from a specific traffic source (e.g., organic search, paid advertising, referral, etc.).

2. Ending your tests too soon

Ideally, anyone conducting split tests should have at least a basic understanding of statistics.

If you don’t understand concepts such as variance, you’re likely making many mistakes.

If you need an introduction, take the free statistic classes at Khan Academy.

At this point, I’ll assume you have the basics down.

Although many marketers have a good grasp of the basics, they still frequently make one mistake: ending the test without a sufficient sample size.

There are three reasons for this:

  1. Lack of knowledge – they don’t truly understand how to calculate the right sample size;
  2. Negligence – they’re in a rush to do something else, so they rush the test;
  3. Over-reliance on tools – most split testing tools offer some sort of confidence metric, but they can be misleading.

The first and third can be fixed.

If you’ve ever used a tool such as Unbounce to conduct split tests, you know the analytics show you something like this:

image01

The table gives you the current conversion rate of each page you’re testing and a confidence level that the winning one is indeed the best.

The standard advice is to cut the test off once you hit 90-95% significance, which is fine advice.

The problem is that a lot (not all) of these tools will give you these significance values before they even mean anything.

You’ll think you have a winner, but if you let the test run on, you might find that the opposite is true.

Keep in mind that conversion experts like Peep Laja aim for at least 350 conversions per page in most cases (unless there’s a huge difference in conversion rates).

You must understand sample size: The fix for this problem, and most sample size problems, is to understand how to calculate a valid sample size on your own.

It’s not very hard if you use the right tools. Let me show you a few you can use.

The first is the test significance calculator. It’s very simple to use: just input your base conversion rate (of your original page) as well as your desired confidence level (90-95%).

image05

The tool comes up with a chart that has a ton of scenarios.

You can see that the bigger the gap between the “A” page and the “B” page, the smaller the sample size needs to be. That’s why it’s best to test things that could potentially make big differences—they speed it up too.

Here’s another sample size calculator you can use. Again, you put in your baseline conversion rate, but this time, you decide on the minimum detectable effect.

image04

The minimum detectable effect here is relative to your baseline, so start by multiplying them together to get 1%.

What that means is that you will have 90% confidence that you’ve detected a conversion rate on your second page that is under 19% or over 21% (plus or minus 1% from the 20% of your baseline conversion rate).

That also means that if your split test results show a 20.5% conversion rate for your second page, you cannot confidently say that it’s better.

Use either of these calculators to get an idea of what sample size you need for your tests. More is always better.

3. Running a split test during the holidays

This is related to segmenting, but it’s an often-overlooked special case.

Your traffic during holidays can be very different from your typical traffic. Including even one day of that abnormal traffic could result in optimizing your site for the people who use your site only a few times a year.

image02

You also have to consider other special days that influence the type of traffic driven to your site:

  • Sales
  • Features in press
  • Big events in your industry

On top of that, these special days aren’t usually one-day things. For example, when it comes to Christmas, those abnormal types of visitors may visit your website leading up to the big day and a week or two after.

The solution? Go longer: The best solution is to exclude these days from your test because they will contain skewed data.

If it’s not possible, the next best solution is to extend your split tests. Go over your minimum sample size so that you have enough data to drown out any skewed data.

4. Measuring the wrong thing

Although it’s in the title, it’s still easy to overlook.

Conversion rate optimization is all about…conversions.

Whenever possible, you need to be measuring conversions—not bounce rates, email opt-in rates, or email open rates.

Those other numbers do not necessarily indicated an improvement.

Here is a simple example to illustrate this:

  • Page A – 5% email opt-in rate, but only 1% (overall) become customers
  • Page B – 3% email opt-in rate, but 2% become customers

If you were measuring only your email opt-in rate on the page, you’d say that page A is decisively better (66% better).

In reality, page B converts traffic twice as well as page A. I’ll take twice the profit over 66% more emails on my list any day.

This is another simple thing, but you need to keep it in mind when setting up any split test.

5. Running before and after tests

This mistake is most commonly done by those new to split testing.

In an attempt to avoid any coding or using tools, the split tester runs a single variation first and then, after collecting enough data, switches the page.

Then, the tester compares the conversion rate of the two pages.

Hopefully, you already see the problems that this causes, which includes many of the things we’ve already gone over.

While it’s still possible to segment your traffic this way in some ways, you automatically can’t segment it by date.

You’ll be comparing the behavior of visitors from different times of the week, month, or year, which is not a valid comparison.

Moral of the story: Always run your split tests simultaneously, or you will mess up your test right from the start.

This brings us to the next mistake…

6. You didn’t test long enough

No, this is not the same as testing until you reach statistical significance.

Instead, it’s about the absolute length of time that you run your split test for.

Say, you used one of the calculators I showed you and found that you need a minimum sample size of 10,000 views for each page.

If you run a high traffic site, you might be able to get that much traffic in a day or two.

Split test finished, right? That’s what most split testers do, and it’s wrong.

All businesses have business cycles.

It’s why your website’s traffic varies from day to day and even from month to month.

image00

For some businesses, buyers are ready to go at the start of the week. For others, they largely wait until the end of the week so that they can get started on the weekend.

It’s not valid to say that buyers who buy at one part of the cycle are the same as buyers at another part. Instead, you need an overall representation of your customers, through all parts of the cycle.

Your first step here is to determine what your business cycle is. The most common lengths are 1 week and 1 month.

To determine it, look at where your sales typically peak. The distance between your peaks is one cycle.

Next, run your split tests until you (1) reach the minimum sample size and (2) complete an integer of your business cycle, e.g., one, two, or three full business cycles—but never 1.5.

That’s the best way to ensure that you have a representative sample.

Conclusion

If you’ve started split testing pages on your website or plan to in the future, that’s great. You can get big improvements, leading to incredible growth in your profit.

But if you’re going to do split testing without the help of an expert, you need to be extra careful not to make mistakes.

I’ve shown you 6 common ways that people mess up their split tests, but there are many more.

Any single one of these can invalidate your results, which may lead you to mistakenly declaring the wrong page as the winner.

You’ll end up wasting your time and even hurting your business sometimes. Even if you get a good return from split testing, it might not be as much as it could be.

For now, keep learning about split testing, and make sure you completely understand the 6 mistakes I showed you here. If you have any questions about them, leave them below in a comment.


Source: quicksprout

6 Accidental Ways to Ruin Your Split Tests and Limit Your Profit