This article will help you:
You’ve designed your experiment, rolled it out to your users, and given them enough time to interact with your new variants. Now it’s time to see if your hypothesis was correct.
In the Analysis panel, you’ll be able to tell at a glance whether your experiment has yielded statistically-significant results, as well as what those results actually are. Amplitude Experiment takes the information you gave it during the design and rollout phases and plugs them in for you automatically, so there’s no repetition of effort. It breaks the results out by variant, and provides you with a convenient, detailed tabular breakdown.
NOTE: This article continues directly from the article in our Help Center on rolling our your experiment. If you haven’t read that and followed the process it describes, do so before continuing here.
Amplitude will not generate statistical calculations for experiments using binary metrics (unique conversions) until each variant has 100 visitors and 25 conversions. Experiments using non-binary metrics need only to reach 100 visitors per variant.
To generate and view experimental results, follow these steps:
- Scroll to the Analysis panel and choose the analysis type you’ll use.
- Choose your experiment’s exposure event. This is the event users will have to fire before being included in an experiment.
NOTE: The exposure event is not the same thing as the assignment event. If, for example, you’re running an experiment on your pricing page, a user might be evaluated on the home page for the experiment—but if they don’t visit the pricing page, they'll never actually be exposed to it. For that reason, this user should not be considered to be part of the experiment.
To learn more about exposure events, see this article in the Amplitude Developer Center.
- The rest of the experiment information—your success metric, your variants, and the way your success metric is measured—should all be filled in for you already. Still, it’s always a good idea to double-check.
- In the Experiment Settings drop-down panel, set the experiment’s confidence level. The default is 95%.
NOTE: Lowering your experiment’s confidence level will make it more likely that your experiment achieves statistical significance, but the trade-off is that doing so increases the likelihood of a false positive.
- Set the time frame for your experiment analysis, either from the selection of pre-set durations, or by opening the date picker and choosing a custom date range.
At this point, the chart will automatically calculate your experiment results. What you’re looking for is a statistically significant result, in the direction you predicted.
Analyze your results
If you’re new to experimentation, you might find yourself a bit overwhelmed by some of the statistical terminology in this article. You’ll need to understand them in order to run a successful experimentation program, but fortunately, they’re not as complicated as they might seem at first. Let’s break them down, starting with the most important: statistical significance.
An experiment is said to be statistically significant when we can confidently say that the results are highly unlikely to have occurred due to random chance. (More technically, it’s when we reject the null hypothesis.) That might sound pretty subjective—what does “highly unlikely” even look like, anyway?—but it’s grounded solidly in statistics. Stat sig relies on a variant’s p-value, which is the probability of observing the data we see, assuming there is no difference between the variant and the control. If this probability drops below a certain threshold (statisticians refer to this threshold as the alpha), then we consider our experiment to have achieved statistical significance.
The value of alpha is in turn determined by the confidence level you select in step 4 above. If, for example, we set a confidence level of 95, the value of alpha will be 1 - [ confidence_level / 100 ], or in this case, 0.05. If the p-value is lower than this, we can conclude that there is indeed a difference between your variant and the control. A higher confidence level gives you a smaller alpha, which will present less risk of a false positive. But this will also require a larger sample size before significance can be achieved.
A related concept is the confidence interval. You can understand this as a range of values that includes the parameter you’re trying to measure, which in this case is the difference in the means between the variant and the control. This is not a probability. Instead, you can interpret it this way: If we conduct this experiment 100 times and have our confidence level set at 95, we’d expect the true value of the parameter to fall within this range at least 95 times.
When you scroll down to your results, at the top you’ll see the significance indicator, which lets you know if the experiment has reached statistical significance. If it has, the top-performing variant will be highlighted. If your experiment has not yet achieved significance, you’ll see a message telling you that your test needs more data to be conclusive.
Next, you’ll find your summary statistics for each of your variants. The bigger, top-line number represents the percentage of users to see that variant who fired the success metric event. The smaller number is the change in that metric, relative to your control variant.
The same information is also shown in the accompanying chart.
Below that is the results table for the experiment. It contains the following information for each variant in your experiment:
- The number of users exposed to the variant.
- The performance of the primary metric, relative to the baseline. For example, if your success metric’s value is 2 in your control, but 4 in your variant, this column will read “4 (+2)”.
- The % lift, representing the proportional change. In the previous example, this column would read “200%”.
- The confidence interval of the variant.
- The significance level the variant reached.
- The number of additional users who would have to be exposed to the variant in order for it to reach statistical significance, if it has not yet done so.
Finally, there are two more charts toward the bottom of this module. On the left is a chart displaying the confidence interval around the movement of your success metric over the duration of your experiment. This chart only displays results for a single variant at a time; select the variant you wish to view from the drop-down menu just above it.
On the right is a chart depicting the daily exposure rates for each variant over the lifetime of your experiment. This is a useful tool for QA, to ensure your experiment’s variants are being distributed the way you expected them to.
Congratulations! You’ve successfully designed, rolled out, and analyzed your experiment.
First, it’s important to remember that no experiment is a failure. Even if you didn’t get the results you were hoping for, you can still learn something from the process—even if your test didn’t reach stat sig. Use your results as a springboard to asking hard questions about the changes you made, the outcomes you saw, what your customers expect from your product, and how you can deliver that.
In general, the next step should be deciding whether to conduct another experiment that supports your hypothesis to gather more evidence, or to go ahead and implement the variant that delivered the best results. You can also export your experiment to the Experiment Analysis in Amplitude Analytics and conduct a deeper dive there, where you can segment your users there and hopefully generate more useful insights.