Amplitude Experiment: key terms

  • Updated

Glossary of key experimentation terms

Term

Definition

Allocation

The percent or number of targeted users you want to get this variant.

Assignment Event

Another name for Enrollment event. 

Audience

A group of users that will be targeted for the experiment. This audience will typically be split evenly into “control” and “variant” groups.

Baseline conversion rate

The current rate of your primary success metrics prior to this experiment.

Bonferroni correction

A statistical technique used to counteract the multiple comparisons problem (also known as multiplicity or the look-elsewhere effect).

Confidence interval

A range of plausible values that contains the parameter of interest. In our case, the true parameter we’re trying to estimate is the difference in means between the treatment and control/baseline. 

For example: if the confidence level is set to 95 and we ran the same experiment 100 times, the confidence interval–in each run–would contain the true parameter at least 95 times.

Confidence / significance level

The probability you will get a false positive. For example, if you have a 95% confidence level, there is a 5% chance of detecting a change to your success metric when there was really no change.

Counter metric

A metric you want to ensure does not suffer at the expense of increasing your success metrics. For example, if you drive users to a free trial of your business product, trials of your consumer product could be a counter metric. If business trials go up, consumer trials will likely go down. You want to make sure there's a net positive effect.

CUPED

Controlled-experiment using pre-existing data, also known as CUPED, is an optional statistical technique meant to reduce variance in experimentation.

Exposure Event

The event that indicates when a user has actually seen a change based on a experiment.

Hypothesis

An assumption of what methods could be taken to solve or alleviate the problem statement and why. 

p-value

The probability of observing data as extreme as what you saw or more assuming that there is no difference between treatment and control.

Payload

Variables attached to a variant, that can be used to remote change flags and experiments without a code change.

Primary success metric

The main metric you hope to move by running this experiment. Should ideally drive both customer and business success.

Problem statement

An explanation of the internal business or user problem you are trying to solve.

Run time

Based on the sample size needed per variant and your traffic levels, how long your experiment will take to run.

Sample size

The number of users/amount of traffic you need in each of your experimental variants in order to soundly detect statistical significance.

Secondary success metric

An additional metric you hope/expect to move with this experiment.

Sequential testing

A statistical analysis where the sample size is not fixed in advance, allowing you to: conduct an A/B test, peek at your results, and conclude them without inflating your false positives.

Statistical power

The probability that you will detect a change to your success metric when there is a change to be detected.

T-test

A statistical analysis that is a comparison of means amongst two populations of data to determine if the difference is statistically significant.

Target lift / minimum detectable effect (MDE)

The percentage change you expect to drive on your primary success metric as a result of this experience.

Type 1 error

Incorrectly classifying that there is a statistically significant difference between treatment and control, when there is not.

Type 2 error

Incorrectly classifying that there is no difference between treatment and control, when there is.