This article will help you:
|
The sample size calculator can help you determine the sample size and experiment run time needed to reach statistical significance in your Amplitude experiment, and to help you decide if an experiment would be worthwhile.
NOTE: While Amplitude Experiment supports sequential testing, the sample size calculator solely supports determining the sample size for t-test. Click here to read more about the difference between sequential tests and t-tests.
Using the sample size calculator
The sample size calculator allows you to enter varying metrics and components based on your unique business needs and relevant historical data. To use the sample size calculator to help plan your experiment, follow these steps:
- From the experiment's Plan tab, click Get An Estimate to open the calculator.
- Choose a primary metric from the Add Metric dropdown. This is the metric that you want to move by running the experiment, and is what will determine the experiment's success.
- Choose a proxy exposure event by clicking Select event... . A proxy event fires at the same time a user is exposed to the experiment and closely resembles the primary metric's exposure event.
- If desired, add properties to the proxy exposure event by clicking on where, then Select property... . For example. you may want to add a where clause to incorporate the rule based targeting.
- Next, the following nominal components of the calculator can be kept as default values or manually adjusted. Modifying these values may have an effect on sample size and run time needed to reach statistical significance. Expect a larger sample size to require a longer run time.
Component name — default | Definition and data validation | Relation to sample size needed for stat. sig. |
Minimum Effect (MDE) — 2% |
The MDE, aka the minimum goal or effect size, is relative to the control mean of the primary metric; it is not absolute nor standardized. For example, if the conversion rate for control is 10%, an MDE of 2% would mean that a change would be detected if the rate moved outside of 9.8% to 10.2%. The value of the MDE relies on context of the experiment. Use the smallest possible change desired to help determine if the experiment would be a success. Any positive number as a percentage. You cannot pick 0%. |
Smaller the MDE, larger the sample size |
Confidence Level — 95% |
The confidence level measures how confident you are that you would receive the same results if you were to roll out the experiment again and again. For example, a confidence level of 95% means that 5% of the time you might interpret the results as statistically significant when they're not (false positive). Amplitude recommends a minimum of 80%, else the experiment's results may no longer be reliable. You cannot pick 0% and you cannot pick 100%. |
Larger the confidence level, larger the sample size |
Control Mean —automatically computed for you when you select the primary metric |
The control mean is the average value of the selected primary metric over the last 7 days (not including today) for users who completed the proxy exposure event. Consider adjusting the mean if there was a recent special event or holiday that may have impacted the average in the last 7 days. Cannot be 0 regardless of metric type. For conversion metrics, this cannot be 1. Note that for conversion metrics .5 means 50% and not .5%. |
Smaller the control mean, larger the sample size |
Standard Deviation — automatically computed for you when you select the primary metric |
Standard deviation signifies the variance, or the spread, in the data (average between each data point and the mean). It only shows up for numerical metrics and not for binary or 0-1 conversion rates. The automatic calculation will be based off of the standard deviation of the primary metric over the last 7 days (not including today) for users that completed the proxy exposure event. Any positive number. |
Larger the standard deviation, larger the sample size |
- If desired, you may expand Advanced Parameters and modify the default values.
Component name — default | Definition | Relation to sample size needed for stat. sig. |
Power — 80% |
Power is the % of true positives, therefore, it can help measure the change's error rate. Think of power as how precise you need to be in your experiment, or what risk you're willing to take for potential erroneous results. You cannot pick 0% and you cannot pick 100%. |
Larger the power, larger the sample size |
Test Type — 2-sided |
A 1-sided t-test will look for either an increase or a decrease of the change compared to the mean, whereas a 2-sided t-test will look for both an increase and a decrease. | 2-sided will require a larger sample size than 1-sided |
As you fill in the calculator's components, the Estimated sample size per variant will be calculated. If a calculation is not possible because there hasn't been traffic for the proxy exposure event or metric chosen, the calculator will notate as such.
Interpreting calculator results
Once all components have been entered, the sample size calculator will display a numerical result — this number is the sample size recommended to reach statistical significance when conducting your experiment. To interpret the run time needed to complete the experiment, you will need to know your average number of users by day by a relevant business cycle. Once the average daily users is confirmed, divide it by the sample size calculator's result to determine the run time needed to perform the experiment. For example, if the sample size calculator's result is 100,000 and your product's exposure event sees 10,000 average users per day, you'd anticipate needing about 10 days to conduct the experiment.
Reducing experiment run time
Sometimes the results of the sample size calculator could indicate a run time that is longer than desired. Consider the following to decrease your experiment's run time:
- Modify error rates to reduce the sample size needed.
- Change the primary metric and exposure event.
- Target more users.
- Modify the standard deviation so that outliers don't carry as much weight.
- Lastly, decide if the experiment is worth the run time or if it should be scrapped.
Ultimately, the value of utilizing the sample size calculator to help plan your experiment is based on the unique needs of your business goals and the risks that you're able to take to run them. Click here to read more about the experiment design phase.
Debugging
If you try to run the calculator but get the error message "We are not able to calculate an estimate. There hasn't been traffic for the proxy exposure event or metric you have chosen," try these steps to debug:
- Check that you have data for the proxy exposure event.
- Check that you have data for the metric, including that there are people who have done the metric after the proxy exposure event.
- Has a proxy exposure event been selected?
- Is the control mean 0 or 1?
- Is the standard deviation 0?
- Is the MDE 0%?
- Is the confidence level 0% or 100%?
- Is the power 0% or 100%?