With Amplitude Recommend, you can sync cohorts to an Amazon S3 bucket. This enables you to effectively export groups of users out of Amplitude and synchronize them with other databases or stored procedures you’ve built off your Amazon S3 bucket. From there, you can use Amplitude cohorts in internal analytics dashboards and internal personalization engines.
This article will explain how to set up and use the integration.
NOTE: this destination is used for syncing cohorts to S3. There is a separate destination for syncing raw data to S3.
Set up your integration
To set up the integration, follow these steps:
- Navigate to Sources & Destinations → Destinations. Click the Amazon S3 (Cohorts) destination. (Do not choose the Amazon S3 destination: that’s for raw data exports)
- From your Amazon S3 console, find the S3 bucket you would like Amplitude to sync with. Note its name, path, and region.
- In Amplitude, enter the bucket name into Bucket Name field in the destination setup. Do the same for the bucket path, and choose the region where the bucket is hosted. Name your sync and click Save.
- Click the Copy Bucket Policy button.
- In the Amazon S3 console, go to the S3 bucket and navigate to Permissions → Bucket Policy. Paste the Amplitude bucket policy into the Amazon S3 console.
- Optionally, you can also set the following two parameters for your buckets:
- Require suffixWhen set, this allows users to append a string at the end of every file exported to S3.
- User property: You can select a single user property to be synced along with each user as an extra column in each file exported.
For more details, see step three the following section.
Export cohorts to Amazon S3
Once the S3 bucket is connected to Amplitude, you can sync any cohort to that bucket. To do so, follow these steps:
- Open the cohort you want to sync. Click Sync To and select Amazon S3 from the drop-down list.
- In the modal that appears, choose the bucket you configured.
- If desired, set the following two optional parameters:
- User Property: Here, you can append a user property to each user exported in this cohort. The user property will appear as an additional column in the exported CSV file.
- Routing Key: Enter a string to be appended to the end of the cohort file name in S3.
For more details, see Cohorts in S3 below.
- Select the sync frequency you need: one-time, hourly, or daily.
- Click Save Sync to complete the process.
Cohorts in S3
Your cohort will be synced as a CSV in the bucket you specified. Within the folder, you’ll see a list of CSVs.
Each sync generates three .CSV files: one with users who entered the cohort since the last sync, one with users who exited the cohort since the last sync, and one containing the users already existing in the cohort at the time of the last sync. This way, you'll always have a complete historical log of S3 cohort membership.
The .CSV files will all use this naming convention:
path: The optional folder prefix on the path the file will be written to.
cohortID: The unique identifier for your cohort. You can find this number in the URL for your cohort in Amplitude.
YYYY-MM-DDTHH-SS: The timestamp when the cohort was synced.
difftype: This describes which of the three user groups the .CSV file contains. Acceptable values are
routingkey: The optional string suffix entered previously.
The timestamp in the .CSV name refers to the day/time the cohort was synced. If you have an hourly/daily scheduled sync, we create a new file with the full list of users who qualify in that cohort at that time. This will be a new file for every sync, with the freshest group of users, and you can maintain a historic log of audience membership.
As described above, each .CSV file will contain a list of users, with data broken into the following columns:
- amplitudeID: The internal amplitude identifier for the user
- userID: Your unique database identifier for the user
- userProperty: The value for a user property you added in step 3 of the exporting cohorts section; there will be one column for each user property. In portfolio projects, there will be a separate column for each source app.