Amplitude Experiment: key terms

  • Updated

Glossary of key experimentation terms

 

Term

Definition

Problem statement

An explanation of the internal business or user problem you are trying to solve.

Hypothesis

An assumption of what methods could be taken to solve or alleviate the problem statement and why. 

Audience

A group of users that will be targeted for the experiment. This audience will typically be split evenly into “control” and “variant” groups

Primary success metric

The main metric you hope to move by running this experiment. Should ideally drive both customer and business success

Secondary success metric

An additional metric you hope/expect to move with this experiment

Target lift / minimum detectable effect (MDE)

The percentage change you expect to drive on your primary success metric as a result of this experience

Counter metric

A metric you want to ensure does not suffer at the expense of increasing your success metrics. For example, if you drive users to a free trial of your business product, trials of your consumer product could be a counter metric. If business trials go up, consumer trials will likely go down. You want to make sure there's a net positive effect.

Baseline conversion rate

The current rate of your primary success metrics prior to this experiment.

Sample size

The number of users/amount of traffic you need in each of your experimental variants in order to soundly detect statistical significance.

Run time

Based on the sample size needed per variant and your traffic levels, how long your experiment will take to run.

Confidence / significance level

The likelihood you will get a false positive. For example, if you have a 95% confidence level (sometimes written as 5% significance level), there is a 5% chance of detecting a change to your success metric when there was really no change.

Confidence interval

A range of plausible values that contains the parameter of interest. In our case, the true parameter we’re trying to estimate is the difference in means between the treatment and control/baseline. 


For example: if the confidence level is set to 95 and we ran the same experiment 100 times, the confidence interval–in each run–would contain the true parameter at least 95 times.

p-value

The probability of observing the data assuming that there is no difference between treatment and control. 

Statistical power

The likelihood that you will detect a change to your success metric when there is a change to be detected.

Payload

Variables attached to a variant, that can be used to remote change flags and experiments without a code change

Sequential testing

A statistical analysis where the sample size is not fixed in advance, allowing you to: conduct an A/B test, peek at your results, and conclude them without inflating your false positives.

Allocation

The % or # of targeted users you want to get this variant 

Type 1 error

Incorrectly classifying that there is a statistically significant difference between treatment and control, when there is not.

Type 2 error

Incorrectly classifying that there is no difference between treatment and control, when there is.

Enrollment Event

An event that signifies a user was enrolled into a Flag, which includes the user’s variant assignment in the event properties.

Assignment Event

Another name for Enrollment event. 

Exposure Event

The event that indicates when a user has actually seen a change based on a experiment.