With the Scale add-on, Amplitude enables dynamic behavioral sampling for ultra-high volume customers who have unique cost challenges. Sampling lets you keep your data costs manageable without compromising the accuracy of your analyses.
At the user level, Amplitude algorithmic sampling framework samples events based on user identity. For tracked users, This preserves the full event streams for tracked users, thus preserving their behaviors. It also ensures the integrity of data in Amplitude, as opposed to random event-level sampling which could potentially provide you with incomplete data.
When sampling is enabled, Amplitude will upsample metrics to provide you with highly accurate estimates on every chart and in every analysis.
To oversimplify, what this means is that Amplitude will multiply your events and users by a sampling factor, equivalent to (100% / sampling rate)
.
For example, if you are sampling at 10%, Amplitude would multiply tracked events by 10 to give you an accurate estimate of your true event volume. This helps every end user in Amplitude focus on analytics without worrying about the sampling rate being used.
NOTE: The features described in this article are only available to Enterprise customers who have purchased the Scale add-on.
Each Amplitude chart will show the sampling rate applied to it. This allows for transparent communication of effective sampling rates.
You can see the raw events seen for your project for last month and the current month, along with the number of events after sampling. This provides you with real-time access to event volume.
NOTE: Sampling does not apply to PROPCOUNT results.
Set up sampling
You will need to be an Admin in your Amplitude organization in order to make any sampling-related changes.
To set up sampling, follow these steps:
- Click
, then click Projects. Select the project you're interested in. Then click Sampling.
- In the modal that opens, click Edit to set the dynamic sampling rate.
The dynamic sampling rate specifies the frequency with which your data will be queried. For example, if you have 50 million active users per year and you set a dynamic sampling rate of 10%, then your queried data will contain 5 million active users per year. Your event costs will be significantly lower, yet you'll still have more than enough data to generate highly accurate analyses.
- Next, set your user property inclusion list, if desired.
This list acts as a safelist to set aside small, key sub-populations from your sampling process. Users included in these populations will be exempt from sampling, and will always appear in your data. These populations are defined by the user properties and values you select in this step.
NOTE: This process does not apply retroactively. Additionally, the following properties are not supported by the user property inclusion list:
, User
ID
, and ID
.Device ID
Accuracy benchmarks
Amplitude benchmarks the accuracy of sampled results in terms of percent error, or relative standard deviation at a 95%, two-tailed confidence interval. This is a function of standard error and the true (unsampled) result.
Customers with high volumes (10M DAUs and above) will achieve results within 0.62% accuracy levels at a 5% sampling rate. Amplitude further assumes that any particular analysis would only need to consider 10% of the DAUs to achieve these results. Higher coverage will generally result in higher accuracy.
The following table shows percent error at a 95% confidence interval across sampling rates for various DAU volumes:
DAUs | Sample Rate | 25% | 10% | 5% | 2% | 1% |
---|---|---|---|---|---|
500,000 | 1.73% | 2.76% | 3.91% | 6.19% | 8.76% |
1,000,000 | 1.22% | 1.95% | 2.76% | 4.38% | 6.19% |
5,000,000 | 0.55% | 0.87% | 1.24% | 1.96% | 2.77% |
10,000,000 | 0.39% | 0.62% | 0.87% | 1.38% | 1.96% |
20,000,000 | 0.27% | 0.44% | 0.62% | 0.98% | 1.39% |
50,000,000 | 0.17% | 0.28% | 0.39% | 0.62% | 0.88% |
For example, if you sample at 10% with 10,000,000 users, it's extremely unlikely you will ever see more than 0.62% error in any metric. So if your retention is 16%, you might see a variance of:
+/- 0.62% * 16% = +/- 0.1%