With daily fluctuations in core metrics you care about, it is hard to know if they are meaningful and worthy of investigation. The Anomaly + Forecast feature highlights statistically significant deviations from expected values based on historical data. This allows you to tell whether a change is truly meaningful, catch instrumentation errors, study seasonal trends, and monitor the impact of product releases.
Forecast allows you to project metrics into the future, so that you can set realistic goals for your team and product.
Prerequisites
- Anomaly detection and forecast can be applied to time-series data within Amplitude
- Chart types supported for this feature: Event Segmentation, Conversion over Time, User Sessions, Retention over Time, Stickiness over Time
- Within Event Segmentation, it works with Rolling Window, Rolling Average, % Growth (within Compare to Past), and custom formulas that support time series
Important Note:
- This is currently a Beta feature that is released to customers on Growth, Enterprise, and Scholarship plan
Table of Contents
Anomaly Detection
The anomaly detection technique used in Anomaly + Forecast is built on top of the extensively tested open source tool "Prophet". It is a procedure for forecasting time series data that is robust to missing data points, shifts in trends, as well as large outliers.
You can find the control for this feature on the left hand side, right above the main chart area.
Modes
The default mode when you turn on the feature is Agile. Agile adjusts more quickly to recent trends and uses the following settings:
- 95% confidence interval
- 120 days of prior training data, if you have daily interval on your chart (120 days prior to the start of the chart date range)
The second mode you can choose from is Robust. Agile and Robust modes both use a 95% confidence interval and recognize seasonal patterns. Robust is best for stable metrics as it incorporates more seasonality and historical data. For example, on a daily chart we use 365 additional training data on top of the chart data.
If you would like more control in changing the parameters (e.g. confidence interval, training duration), you can use the Custom mode.
Seasonality
Seasonalities are detected and applied automatically in each mode, depending on how much data is used to train the model. For example, Agile mode typically tries to use daily and weekly seasonalities, and Robust mode would by default try to apply monthly and yearly seasonalities. But depending on the data available, they might not always be applied.
How to interpret the results
When you have a single series on your chart, you would see a light blue band (confidence interval band) and a dashed line representing the expected value beside the solid blue line that is your actual data. If there are anomalies detected, they will appear outside of the confidence band in orange. If you don’t see any orange dots, it means we didn’t detect any anomalies and all the data points are within the confidence interval.
You can interpret anomalies in this way: “with 120 days of training data, we are 95% confident that this change is unexpected.”
Confidence Interval
The default confidence interval when you first turn on the feature is always 95%. You can choose a different confidence interval of 80%, 95%, 98%, or 99% if you choose Custom mode from the mode tag under the feature button or from the settings dropdown.
The higher the required significance, the less noise there will be in the anomalies detected. You will likely see fewer anomalies appear on the chart. A 95% confidence interval will have a narrower band than a 99% confidence interval.
Training Data
Default training durations we use for different time intervals and modes are different. In Custom mode, it is configurable and is added to the chart date range.
For example, when you use daily interval, and you are looking at last 30 days of data on the chart. The default data for daily charts is 120 days prior to the start of the chart date range. We will use a total of 150 (120+30) days of data to train the model.
In Agile mode, we use the following default prior training data:
>Time interval used on the chart | Default Training Duration |
---|---|
Real time | Not Available |
Hourly |
30 days |
Daily |
120 days |
Weekly |
26 weeks |
Monthly |
6 months |
Quarterly |
Not Available |
In Robust mode, we use the following default prior training data:
>Time interval used on the chart | Default Training Duration |
---|---|
Real time | Not Available |
Hourly |
60 days |
Daily |
365 days |
Weekly |
52 weeks |
Monthly |
12 months |
Quarterly |
Not Available |
If you have a very specific training duration in mind that Agile or Robust modes don't offer, you can specify how much data to train the model with by choosing the Custom mode.
How to Investigate Anomalies
A natural question you might have after noticing an anomaly is: what caused it? Many users found 3 workflows helpful to investigate what might be happening:
- Look at a few related metrics to see if you observe anomalies on other related metrics as well. You might add the sub-metrics that make up that parent metric you were looking at. Or look at the events that fire before or after the step in the funnel.
- Use group bys on some properties that would yield more insights on why these anomalies may have occurred
- Use business context: is there anything that shipped on that day that may have caused the anomaly?
You can use the Anomaly Detection feature when you have multiple series on a chart. We will automatically run the model on up to 10 time series for the first time, but you can use the table to add additional time series that you want to see anomalies for.
When you have multiple time series on the chart, we only show the anomaly points in orange, without showing the confidence interval band and expected value line. However, you can hover on a particular series to see the band and dashed line for the expected values.
Forecasting
The main difference between Forecast and Anomaly Detection is that Forecast projects your metrics into the future, whereas anomalies are only detected with your historical data.
Forecast feature can be turned on within the Anomaly + Forecast button. It uses the same open source tool "Prophet" to run the forecasts based on your historical data. After clicking "Anomaly + Forecast", you can click the “+add forecasting” tag to turn on the feature.
Default settings when you turn on the feature:
- 95% confidence interval
- The amount of prior training data we use for each time interval is the same as mentioned above, in the Anomaly Detection section, based on the mode you choose
How to interpret the results
Once the feature is on, you will see the solid blue line, actual data, ends, and your future forecast line begins. The way Prophet projects metrics into the future is by assuming the magnitude and frequency of changes observed in the past are similar in the future, with a certain degree of confidence.
Here’s one way you can communicate the result shown on the chart:
“Based on the trend seen with the last 120 days of data, we are 95% confident that this metric will be between [high value] and [low value] on [a future date].”
Similarly with Anomaly Detection, you are able to configure the confidence level and training duration, from the Custom mode, as well as how many periods you want to forecast for.
Related: Insights Package
If you want alerting for anomalies, we offer an Insights package that includes automatic and custom monitor alerts. You can find more about the package here
Video Walkthrough