Amplitude’s GCS Import feature lets you import event or user properties into your Amplitude projects from an GCS bucket. This article will help you quickly configure this data source within Amplitude.
Before you begin
Before you start, you’ll need to do a few things:
- First, make sure you have admin permissions for your Amplitude org.
- Identify the project that's going to receive the data. If it doesn't exist yet, create a new project and then navigate to Data Sources before beginning the procedure described below.
- Have a converter for data mapping purposes in a text editor. Contact your Amplitude team if you need support with the converter setup.
- Ensure your GCS bucket contains data files ready to be ingested. This means they conform to the mappings outlined in the converter.
- Make sure the data sitting on your GCS bucket follows the format outlined in Amplitude’s HTTP API v2 spec.
Storage organization requirements
Amplitude can only process new data from a GCS bucket if it's organized in the following way:
{bucket name}/{GCSPrefix}/{YYYY}/{MM}/{DD}/{HH}/{optional}/{additional}/ {folder}/{structure}/{file name}
where:
{bucket name}
is the name of your GCS bucket;{GCSPrefix}
is the source prefix folder specified in your source setup configuration;{YYYY}/{MM}/{DD}/{HH}
is the required date prefix format to upload new files. You should organize files according to the time they’re uploaded to the bucket, and not when the files are generated in your system. Also, you must always use two digits (as opposed to one) to represent the month, day, and hour;{optional}/{additional}/{folder}/{structure}
is where you can add additional folder structure details. These details are strictly optional. If you do include them, an example file path might look like{bucket name}/{GCSPrefix}/{YYYY}/{MM}/{DD}/{HH}/cluster-01/node-25/{file name}
.
To be clear, these organizational requirements only apply to new data you want to import after the source is enabled. You do not have to reorganize any pre-existing files, as Amplitude's GCS Import feature will capture the data they contain on the first ingestion scan. After that, however, new data uploaded to the bucket must conform to the requirements outlined here.
Add a new GCS Source
To add a new GCS data source for Amplitude to draw data from, follow these steps:
- In Amplitude, navigate to Data Sources and select the desired project from the dropdown menu. Then click I want to import data into Amplitude.
- Select GCS from the available tiles. If this source doesn’t appear in your list, please contact the Amplitude team.
- Once you click into the GCS tile, you’ll be prompted to upload the Service Account Key file. This will give Amplitude the permissions to pull data from your GCS bucket. You can find the permissions you need to give to the GCS Service Account here.
- Once you’ve uploaded the Service Account Key file, you should then scroll down and fill out the bucket name and folder where the data resides.
- Click Next to test the credentials. If all your information checks out, Amplitude will display a success message. Click Next > to continue the process.
- In the Enable Data Source panel, name your data source and give it a description. (You can edit this information later, via Settings.) Then click Save Source. Amplitude will confirm you’ve created and enabled your source.
- Click Finish to go back to the list of data sources. If you’ve already configured the converter, the data import will start momentarily. Otherwise, it’s time to create your data converter.
Create a converter
NOTE: You can contact your Amplitude team to assist with the converter setup if you need help.
The final step in setting up Amplitude’s GCS ingestion source is creating the converter file. The converter specifies how Amplitude should process the ingested files and is accomplished in two steps: First, configure the compression type, file name, and escape characters for your files; then use JSON to describe the rules your converter will follow.
The converter language describes the extraction of a value given a JSON element. This is specified by a SOURCE_DESCRIPTION
, which includes:
BASIC_PATH
LIST_OPERATOR
JSON_OBJECT
To create and configure your converter file, follow these steps:
- Click Edit Converter to configure the compression type, file name, and escape characters for your files. The boilerplate of your converter file will pre-populate, based on the selections made during this step. You can also test whether the configuration works by clicking Pull File.
- When you’re ready, click Next > to proceed to the converter configuration panel.
- In the text window labeled Describe the Conversion, enter the rules your converter will follow during the ingestion process, using the operators listed here.
- Click Finish when you’re done.