Amplitude offers dynamic behavioral sampling for ultra high volume customers who have unique cost challenges. This article will walk through the functionality Amplitude offers as part of its Scale program.
Table of Contents
- User-Level Sampling
- Upsampling of Metrics
- User Property Inclusion List
- Accuracy Benchmarks
- Setup and Administration
Note: This feature is only available to Enterprise customers who have purchased the Scale add-on.
Our algorithmic user-level sampling framework samples events based on user identity. For tracked users, this preserves their full event streams and thus preserves “behaviors”. This preserves the integrity of data in Amplitude as opposed to random event-level sampling which would provide you with incomplete data for your tracked users.
Upsampling of Metrics
Amplitude will upsample metrics in order to provide you with a highly accurate estimate of your metrics on every chart and in every analysis. To oversimplify, what this means is that we would multiply your events and users by a sampling factor which equals:
(100% / sampling rate)
For example, if you are sampling at 10% then we would multiply tracked events by 10 to give you an accurate estimate of your true event volume. This helps every end user in Amplitude focus on analytics and not have to worry about what sampling rate is being used.
User Property Inclusion List
Amplitude allows you to select certain user populations to protect them from sampling. Within the control module, you will be able to configure a user property whitelist in order to protect small, key sub-populations. This is useful if you already know that there are small populations that are extremely interesting to you. For example, if you want to always include whales or people in a certain AB test, then you can set up an inclusion list for those specific groups of users. People in this inclusion list will be exempt from sampling and will always be included in the data. This will not apply retroactively so if you update your user property inclusion list to include a certain user and they were previously rejected, then their past events will not show up in Amplitude. Only their events going forward will be queryable.
Note: Currently, user property whitelisting does not work for the '[Amplitude] User ID', '[Amplitude] ID', and '[Amplitude] Device ID' fields.
We benchmark the accuracy of our sampled results in terms of % error or relative standard deviation at 95% two tailed confidence interval. This is a function of standard error and the true (unsampled) result. Therefore, at high volumes like 10M DAUs and above, customers will achieve results within 0.62% accuracy levels at a 5% sampling rate. We further assume that any particular analysis would only be considering 10% of the DAUs to derive these results. Higher coverage would result in better accuracies.
The following table shows % error at 95% confidence interval across sampling rates for various DAU volumes:
|DAUs | Sample Rate||25%||10%||5%||2%||1%|
To interpret this table, find your daily active user count and match that row with the column for the rate you will be sampling at. For example, if you sample at 10% with 10,000,000 users, then it's extremely unlikely you will ever see more than 0.62% error in any metric. This is represented by the yellow cell. So, if your retention is 16%, then you might see a variance of:
+/- 0.62% * 16% = +/- 0.1%
Setup and Administration
To set up sampling, you will need to be an Admin in your Amplitude organization. Then, navigate to the "SAMPLING" option in the top right-hand corner of the Settings page for the project you wish to set up sampling for.
Dynamic Sampling Rate
Amplitude allows you to specify a dynamic sampling rate that controls queried data volumes, providing you with complete control over the sampling rate which in turn affects the cost of your event volume. You can control the sampling rate directly in the platform at any time. For example, if you have 50 million active users per year and you set a dynamic sampling rate of 10%, then your queried data will contain 5 million active users per year.
Sampling Rate Change Log
In the sampling module, there is a change log so that you can verify changes being made to your sampling rate as well as see previous sampling rates that have been set for your project.
Transparent Communication of Sampling Rates
Amplitude's charts will show the sampling rate applied to each analysis. This allows for transparent communication on the effective sampling rate applied to each analysis.
You can also see the raw events seen for your project for last month and the current month along with the number of events after sampling. This provides you with realtime access to event volume.