This article will help you:
|
This article describes several tips, tricks, and best practices for Amplitude customers who've purchased the Redshift ETL add-on.
1. Data for respective projects will be kept in their own schemas namespaces/packages in Redshift.
By default, every Redshift command you run will be under the public schema. However, you can select a different schema instead, using the SET search_path
command:
SET search_path = app123;
SELECT COUNT(*) FROM events;
Or you can include the schema as a prefix to the table:
SELECT COUNT(*) FROM app123.events;
2. Query directly from each project's table instead of the entire events.
The events from each of your Amplitude projects are stored in their own tables. The table name for each project is events###
, where the ### is the project number. You can find this in the URL of the Amplitude dashboard. The union of each project's events appears in a table called events
. Selecting FROM events###
whenever possible will make your queries faster and more efficient.
3. Custom event properties and custom user properties associated with an event type will be pulled into their own columns in the respective event_type
table.
Custom user properties will be appended by a u_
and custom event properties will be appended by an e_
.
NOTE: This is limited to a maximum of 400 user properties and 50 event properties. Anything past the limit will still require the JSON_EXTRACT_PATH_TEXT
function.
4. Always include a date range in the WHERE
clause of your query.
Amplitude's Redshift tables do not have a primary key. Instead, they're sorted by the event_time
column. Adding a date range in the WHERE
clause of your query will significantly increase query speeds. We recommend using the DATE()
function with event_time
as the input.
5. Avoid SELECT *
queries when possible.
The more columns you select, the slower your query will be. Selecting only relevant columns as opposed to all (*) columns will significantly increase query speeds and show only relevant data.