Glossary of key experimentation terms
Term |
Definition |
Problem statement |
An explanation of the internal business or user problem you are trying to solve. |
Hypothesis |
An assumption of what methods could be taken to solve or alleviate the problem statement and why. |
Audience |
A group of users that will be targeted for the experiment. This audience will typically be split evenly into “control” and “variant” groups |
Primary success metric |
The main metric you hope to move by running this experiment. Should ideally drive both customer and business success |
Secondary success metric |
An additional metric you hope/expect to move with this experiment |
Target lift / minimum detectable effect (MDE) |
The percentage change you expect to drive on your primary success metric as a result of this experience |
Counter metric |
A metric you want to ensure does not suffer at the expense of increasing your success metrics. For example, if you drive users to a free trial of your business product, trials of your consumer product could be a counter metric. If business trials go up, consumer trials will likely go down. You want to make sure there's a net positive effect. |
Baseline conversion rate |
The current rate of your primary success metrics prior to this experiment. |
Sample size |
The number of users/amount of traffic you need in each of your experimental variants in order to soundly detect statistical significance. |
Run time |
Based on the sample size needed per variant and your traffic levels, how long your experiment will take to run. |
Confidence / significance level |
The likelihood you will get a false positive. For example, if you have a 95% confidence level (sometimes written as 5% significance level), there is a 5% chance of detecting a change to your success metric when there was really no change. |
Confidence interval |
A range of plausible values that contains the parameter of interest. In our case, the true parameter we’re trying to estimate is the difference in means between the treatment and control/baseline. For example: if the confidence level is set to 95 and we ran the same experiment 100 times, the confidence interval–in each run–would contain the true parameter at least 95 times. |
p-value |
The probability of observing the data assuming that there is no difference between treatment and control. |
Statistical power |
The likelihood that you will detect a change to your success metric when there is a change to be detected. |
Payload |
Variables attached to a variant, that can be used to remote change flags and experiments without a code change |
Sequential testing |
A statistical analysis where the sample size is not fixed in advance, allowing you to: conduct an A/B test, peek at your results, and conclude them without inflating your false positives. |
Allocation |
The % or # of targeted users you want to get this variant |
Type 1 error |
Incorrectly classifying that there is a statistically significant difference between treatment and control, when there is not. |
Type 2 error |
Incorrectly classifying that there is no difference between treatment and control, when there is. |
Enrollment Event |
An event that signifies a user was enrolled into a Flag, which includes the user’s variant assignment in the event properties. |
Assignment Event |
Another name for Enrollment event. |
Exposure Event |
The event that indicates when a user has actually seen a change based on a experiment. |