This article will help you:
- Read and interpret a Compass chart
- Understand correlation and why Compass uses it
Amplitude's Compass chart shows how a new user firing an event correlates with that user being retained. Understanding which user events lead to retention is a critical tool in driving sustainable product growth.
Before you begin
Make sure to read our article on building a Compass chart before you dive in here. Otherwise, this article won't make much sense.
How to read your Compass chart
When you first launch a Compass chart, you might have a specific hypothesis about which events are likely to drive retention. But even if you don't, Compass can help you develop one.
In the previous article, you saw how Compass generates a heat map of user events and correlations by default when Any Event is selected.
This is a quick summary of the events most correlated with members of a base cohort of users converting to a target cohort. It's a great place to start if you don't have much data to go on yet.
NOTE: If you're not familiar with correlation, check out our handy explanation towards the end of this article.
You can sort the table in ascending or descending correlation for a given day by clicking on the day labels across the top. Clicking on a specific cell will bring a popup containing more detailed information about the event/day combination you selected.
This summary report is useful for looking at your data from a bird's-eye view, e.g. looking for events that should have been at the top but were not.
Once you choose an event to focus on, Compass replaces the heat map view with a more detailed breakdown.
As an example, let's look at how triggering the event
Social Action: Add Friends within the first seven days of becoming a new users correlates with second-week retention, and walk through the different components of the reports generated by Compass.
On the left, we see the correlation scores of that event, sorted out by the frequency with which your users have triggered it. By default, the report will show you the frequency with the highest correlation. Here, you can see that users who triggered
Social Action: Add Friends at least once had the highest correlation score, and are thus most likely to have ended up in the second-week retention cohort. However, the overall correlation between triggering
Social Action: Add Friends and second-week retention is weak overall.
NOTE: It's important to keep in mind that correlation and causation are not the same thing. A high correlation score may suggest some sort of causal relationship between two events, but it can also mean that each of those events is highly correlated with another, as-yet-unidentified event.
Click on any of the buckets to view a detailed breakdown of that event / frequency combination.
On the right, you can see the correlation score for (a) this event at this particular frequency, and (b) your target cohort. While it's hard to generalize, in some cases even correlations as low as 0.2 can be considered when looking at smaller numbers of initial days for each user.
Amplitude categorizes correlation scores like this:
- Highly Predictive: correlation >= 0.4
- Moderately Predictive: 0.3 <= correlation < 0.4
- Slightly Predictive: 0.2 <= correlation < 0.3
- Not Predictive: correlation < 0.2
Create a cohort from your results
Let's return to our example above, where we looked at users who triggered the
Social Action: Add Friends event at least once in the first seven days they were new users. You can create a cohort by clicking Create Cohort. Then Amplitude will automatically compare their retention to new user retention.
Note that this comparison is based on any active event, and not simply
Clicking Show (next to Correlation Table) will bring up a detailed contingency table that shows the count of users in your base cohort in each of four categories: true positives, false positives, false negatives, and true negatives.
Likewise, you can see detailed statistics on your cohort by clicking Show (next to Detailed Statistics):
You can read more about these statistics here.
Choose a different metric
Compass defaults to showing correlation scores, but you can select a different metric if it better suits the needs of your analysis. Just select the metric you're interested in from the Correlation dropdown menu:
The metrics available to you are:
- Correlation with errors
- Positive predictive value only
- Negative predictive value only
- Sensitivity only
- Specificity only
- Proportion above threshold only
View statistical significance
Compass allows you to toggle on and off the 95% confidence interval of the correlation. Click on the blue numerical text on the right-hand side of the table to display the interval on the left-hand side bar chart.
Correlation is a measure of how two statistical variables relate to each other. Possible values range from -1 to 1, with a score of zero indicating there is no statistical relationship between the variables at all. A score of one indicates perfect positive correlation, while a score of -1 indicates perfect negative correlation.
In a Compass chart, the two variables to be correlated are:
- Did the user trigger the event in question at least a certain number of times; and
- Was the user retained in the target cohort?
You may have heard of different variations and definitions of correlation. Well-known examples include Matthews correlation, Pearson correlation, phi coefficient, and R-value. In this case, all these different methods generate equivalent results, because Compass looks at pairs of binary random variables.
Remember, correlation is not causation, so any hypotheses you come up with via a Compass analysis must still be tested and verified in the real world.
Here are some more technical definitions of correlation:
- Correlation of X and Y is the covariance of X and Y divided by the geometric mean of their variances.
- If X is modeled as an affine function of Y and Y is modeled as an affine function of X, each with minimal root mean squared error, then the correlation of X and Y is the geometric mean of the predictive coefficients of these two functions.
Why is correlation a good metric to use here?
When you're looking for that one metric that captures your users' "a-ha" moment, you want one where most users above a certain threshold go on to be retained, and most users below the threshold end up not being retained. Such a metric would have a threshold with a good positive predictive value (PPV) and a good negative predictive value (NPV).
However, you also have to consider is how easy it will be to move users across that threshold. If you find a threshold with a very strong PPV and NPV, but discover that it's very difficult to move users across it, that metric will not be of much help in growing your user base. A tell-tale sign of this would be if few of your users have crossed the threshold, or almost all of your users have already crossed it. This isn't always the case, of course—but in the absence of more specific information, it's generally a good assumption.
That's why Compass uses correlation to locate these thresholds: correlation accounts for PPV, NPV, and the proportion above the threshold. If the PPV is higher, the NPV is higher, or the fraction of users above the threshold is closer to 50%, then the correlation will also be higher. Likewise, if the PPV is lower, the NPV is lower, or the fraction of users above the threshold is further from 50%, then the correlation will be lower.
NOTE: This gets a little less clear-cut when it comes to negative correlations, but you won't typically be looking at negative correlations when using Compass.