With Amplitude’s Snowflake integration, you can ingest Snowflake data directly into your Amplitude project. This article will walk you through the steps needed to make that happen.
NOTE: Depending on your company’s network policy, you may need to whitelist the following IP addresses in order for Amplitude’s servers to access your Snowflake instance:
-
- 52.33.3.219
- 35.162.216.242
- 52.27.10.221
Add Snowflake as a source
To add Snowflake as a data source in your Amplitude project, follow these steps:
- In Amplitude, navigate to Sources section and make sure you’ve selected the correct project from the project list dropdown.
- Click I want to import data into Amplitude, followed by the Snowflake tile.
Amplitude will launch the configuration wizard. Here you’ll enter the required credentials for the Snowflake instance that houses the relevant data:
-
-
-
- Account: Snowflake account (case sensitive)
- Database: Name of the database where Amplitude can find the data
- Warehouse: Used by Amplitude to execute SQL
- Username: Used by Amplitude for authentication
- Password: Used by Amplitude for authentication
-
-
- Next, copy and run the auto-generated SQL code on the right-hand side of the page to give Amplitude the proper access to your Snowflake instance.
- After running the code, click Next to test the connection and ensure it’s working as expected. If all credentials were entered correctly, you should see this response:
- Click Next again to move on to the data selection stage.
You have several configuration options to choose from here:
-
-
- Type of data: This tells Amplitude whether you’ll be ingesting event data or user property data. (Group property data will be available soon.)
- Type of import:
- Full Sync: Amplitude will periodically import the entire dataset, regardless of whether or not that data has already been imported. This is good for data sets where the row data changes over time, but there is no easy way to tell which rows have changed. Otherwise, the more efficient option would be a time-based import. This option is not supported for ingesting event data.
- Time-based: Amplitude will periodically ingest the most recent rows in the data, as determined by the provided Timestamp column. The first import will bring in all available data, and subsequent imports will ingest any data with timestamps after the time of the most recent import. In order for this to work, you must include the timestamp column in the output of your SQL statement.
- Frequency: Choose from several scheduling options ranging from five minutes to one month (when this is selected, ingestion happens on the first of the month).
- SQL query: This is the code for the query Amplitude will use to determine which data is ingested.
-
- Once you’ve set your configuration options, click Test SQL to see how the data is coming through from your Snowflake instance. If there are any errors, they’ll appear below the Test SQL button.
- If there are no errors, click Finish. You’ll see a notification indicating you’ve successfully enabled the new Snowflake source. You’ll also be redirected to the Sources listing page, where you’ll see the newly created Snowflake source.
If you have any issues or questions while following this flow, please contact the Amplitude team.
Mandatory data fields
You must include the mandatory fields for the data type when creating the SQL query. These tables outline the mandatory and optional fields for each data type. You can include other columns beyond those listed here.
Events
Column name |
Mandatory |
Column data type |
user_id |
Yes, unless device_id is used |
VARCHAR |
device_id |
Yes, unless user_id is used |
VARCHAR |
event_type |
Yes |
VARCHAR |
time |
Yes |
Milliseconds since epoch (Timestamp) |
event_properties |
Yes |
VARIANT (JSON Object) |
user_properties |
No |
VARIANT (JSON Object) |
update_time_column |
No (Yes if using time based import) |
TIMESTAMP |
User properties
Column name |
Mandatory |
Column data type |
user_id |
Yes |
VARCHAR |
user_properties |
Yes |
VARIANT (JSON Object) |
update_time_column |
No (Yes if using time based import) |
TIMESTAMP |
SQL query examples
To make the data selection step a bit easier, we’ve added a few sample SQL snippets to get you started. If you have any questions or need help in creating the SQL query, please reach out to the Amplitude team.
Event data example
SELECT
EVENT_TYPE_COLUMN AS "event_type",
EVENT_PROPERTIES_VARIANT_COLUMN AS "event_properties",
TIME_EPOCH_MS_COLUMN AS "time",
USER_ID_COLUMN AS "user_id",
USER_PROPERTIES_VARIANT_COLUMN AS "user_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME
User property example
SELECT
USER_ID_COLUMN AS "user_id",
USER_PROPERTIES_VARIANT_COLUMN AS "user_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME
Common snippets:
Creating a JSON Object:
OBJECT_CONSTRUCT('city', CITY, 'state', STATE) as "user_properties"
Converting timestamp column to milliseconds:
DATE_PART('EPOCH_MILLISECOND', TIMESTAMP_COLUMN) as "time"
Converting milliseconds to timestamp needed for time based import:
TO_TIMESTAMP_NTZ(TIME_COLUMN_IN_MILLIS) as "timestamp_column"