This article will help you:
This article will outline the steps of building and analyzing predictions, and then building and analyzing the predictive cohorts you generate from them.
NOTE: Be sure to check out our other articles on predictive cohorts, Predictive cohorts: use Amplitude's AI to help maximize lift and Use predictive cohorts in your campaigns.
Build a prediction
To build a prediction in Amplitude, follow these steps:
- From the Cohorts page, click + New Cohort. Then, in the page that opens, click Create a Prediction. This will open the Create a Prediction page.
- By default, Amplitude assumes you want this prediction to apply to all users. To change this and use a different starting cohort, click Define your own.
- The first step is selecting the users who will be included in the cohort. Under Define your Starting Cohort, select the events, properties, or statuses that users in your cohort share.
- Next, specify the action you want the starting cohort to take. Under Define a Future Outcome, you can specify events you want—or don’t want—your users to fire, the properties you want them to have after taking an action, or some combination of all three.
TIP: Another way to think about a prediction is as a cohort transition: you’re predicting the relative likelihood of a user to transition from Cohort A (the starting cohort) to Cohort B (the future outcome) in the coming week.
- Click Save and name your prediction. It will take about an hour for Amplitude to build it. You’ll receive an email when the process is done.
Analyze your prediction
Once Amplitude has finished building your prediction, you’ll want to take a look at the results. Depending on what you see, you’ll either save the prediction as a cohort, or start over with a new prediction.
- To view the results of your prediction, click the Predictions tab from the Cohorts page. This will show you a list of all the predictions created so far.
- Find your prediction and click it to open the prediction explorer. Here, you’ll see the distribution of all users in your starting cohort:
- The Y-axis shows the likelihood a user will convert (i.e., arrive at the future outcome you specified earlier)
- The X-axis shows the percentile of users
You can select a range of users by percentile and see how many users fall in the range, the predicted conversion rate of users in that range, and the likelihood of conversion for those users relative to the average.
NOTE: percentile and probability are not the same thing. If you select the 80% - 100% percentile range, this does NOT mean the users in it have an 80% - 99% probability to convert. Instead, it means they’re in the top 20% of users, as ranked by probability to convert.
At this point, you’ll want to evaluate whether your prediction is accurate or not. Amplitude provides for metrics for you to accomplish this:
- Accuracy: technically, this is the area under the curve, a measure that weighs both true positive and false positive rates
- True Positive Rate: this is the ratio of predicted users who convert
- False Positive Rate: this is the ratio of predicted users who do not convert
- Predicted vs Actuals: this compares the predicted conversion rates to observed historical conversion rates and gives you the difference, in percentage terms
Generally speaking, a good model will have an accuracy of at least 70%. Any model with an accuracy of 50% or less will be no better than a coin flip in its predictive ability.
“Black box” predictions aren’t generally insightful. That’s why Amplitude ranks the events and user properties that are most important to your predictive model in the Feature Importance table, which you can find just below the Percentile Breakdown chart:
The Ratio column is a ranking of events or properties according to their importance to the model. It’s computed by comparing the percentage of users in the selected percentile range who fire an event to those not in the selected percentile range.
The % in Range column specifies the percentage of users in the selected percentile range who fired the respective event. Sort by this column to rank events according to overall level of engagement.
The Frequency column displays the average number of times a user in the selected range fires an event. Sort by this column to rank events according to overall level of engagement.
Build a predictive cohort
Once you’ve got a useful prediction, you can save it as a predictive cohort. This enables you to return to it in the future and repeatedly use it in targeting campaigns.
To save a predictive cohort, simply select the desired percentile range on the chart, and click Save as a Cohort. And while it can be tempting to just slice the starting cohort into two sections—i.e., top 20% vs bottom 80%, or top 50% vs bottom 50%—other approaches can give you far more useful results:
- Probability inflection: Find the spot where the distribution graph spikes exponentially (see the above screenshot for an example), and split users along the spikes. This will group users into broadly similar buckets of predicted conversion rates.
- Sample size: If you have an idea of how many users you want to target in a growth campaign, then select that percentage on the right side of the chart. For example, if you want to target 2000 users and you have 20,000 users in the starting cohort, then simply select the top 10%.
- Minimum detectable lift. If you plan to target the selected users in a growth campaign, make sure the sample size is large enough to detect incremental lift. For example, if the top 20% of a prediction is 20,000 users, but the predicted conversion rate is 1%, you won’t be able to detect lift at statistically significant levels. Instead, you must increase the sample size to top 45% of users at 45,000 users.
NOTE: When a user’s probabilities change, Amplitude will automatically adjust their cohort membership if they fall into or out of the selected percentile range.
Analyze your predictive cohort
Once you save a predictive cohort, you can use it for analysis in any Amplitude chart. Here are some suggestions for analyses using predictive cohorts:
- Create top 20% and bottom 80% cohorts to compare the best and worst users. Set them as different segments in the right module of any chart.
- Event Segmentation: see the historical behavioral trends of best users vs worst users prior to converting.
- Pathfinder: identify the different sequences of actions users take if they have a high likelihood vs low likelihood to convert.
- Composition: break down the property values of the respective cohorts to differences in user properties (i.e., which countries the best users vs worst users are in).
- Engagement Matrix: compare the events fired by the best users vs the worst users, based on the balance of frequency and % of users.
- Funnel: compare relative conversion rates for any sequence of actions between the best users and worst users.