This article will help you:
Predictive cohorts is a workflow improvement feature that helps you optimize targeting workflows to generate maximal lift.
Instead of building cohorts based on what users have done in the past, predictive cohorts let you segment your users based on their likelihood to perform a specific action in the future. It’s most useful in three types of workflow improvement: communication frequency, dynamic pricing, and content personalization. Use it to:
- Specify which users to include or exclude in a campaign
- Adjust messaging frequency based on a user’s likelihood to convert
- Modify pricing, offers, and discounts relative to a user’s likelihood to convert
- Fine-tune the content in an ad, email, or website depending on their affinity for that content type
Building a predictive cohort is a multi-step process. Before you can save a predictive cohort, you’ll have to build the prediction that powers it. Predictions construct a mathematical model to forecast the likelihood that a particular user will take a specific action in your product, and in turn groups users who have similar probabilities. Once you have a useful prediction, you can save it as a predictive cohort.
But first, you’ll need to decide what predictions to build.
What question should your prediction answer? In most cases, it will be closely tied to the objectives that guide your company as a whole, your “north stars.” Start by mapping out the user journey, complete with KPIs from the user’s first interaction with your product to their last touch. Some common steps along the user journey are signup, activation, retention, and churn.
Once you’ve identified all the steps, fill in all the milestones along the way by specifying every major button interaction that happens between those steps. You’ll want to build a prediction for every step of this journey.
For example, a user journey for an ecommerce product would look something like this:
Who should use predictive cohorts, and when
Predictive cohorts work best in specific situations:
- When your target outcome lacks a clear funnel. These are usually the culmination of circuitous user journeys, and are difficult to frame as a clear binary event. Some common examples of outcomes without clear funnels are activation, retention, engagement, or long-term value. If these are the metrics that matter most to your product, predictive cohorts can be a useful tool.
- When you’re trying to drive incremental lift to these outcomes. A predictive cohort can, on average, drive a 5% to 20% lift relative to a behavioral cohort.
- If your product has over 100,000 monthly average users. Anything less than this is unlikely to generate sample sizes that are large enough to draw reliable statistical inferences.
Conversely, your company is less likely to benefit from predictive cohorts if you:
- Sell physical products
- Are in the B2B space, or
- Lack a marketing team
When you’re ready to get to work with predictive cohorts, be sure to read our Help Center articles on building predictive cohorts and using your predictive cohorts in campaigns first. Or have a look at the section below, which describes the process by which Amplitude builds predictions and how they work.
How predictions work
Predictions use past behavior to predict future behavior. When you build a prediction, Amplitude creates a mathematical model to distinguish between users who will perform the action you specify and users who will not.
Amplitude starts by looking at users who were in the starting cohort two periods ago, and will then identify which of those users did vs did not perform the action one period ago (a period can be set to seven, 30, 60, or 90 days).
Next, Amplitude compares those two groups of users along three sets of variables: events, event properties, and user properties:
- Events: How often each user fires the 100 most common events, every week for the last 12 weeks
- Event properties: How often each user fires the most frequently queried event properties, every week for the last 12 weeks
- User properties: The most recent value of a user property in the last 12 weeks
These variables are then included in a logistic regression model, effectively building a cohort on hundreds of behavioral signals. By contrast, most behavioral cohorts rely on three to five of these signals. This model calculates a probabilistic score for every user in the starting cohort, measuring how likely they are to perform the action you’re interested in at some point in the next seve, 30, 60, or 90 days, depending on the value you specify. It’s retrained every day, to account for the potential skewing effects of seasonal data, and each user’s score is recalculated once a day (every hour if the user is being synced via Engage).