This article describes Redshift best practices for clients who have purchased the Redshift ETL add-on.
As of 1/17/17, we now refer to "apps" as "projects."
1. Data for respective projects will be kept in their own schemas namespaces/packages in Redshift.
By default, every Redshift command you run will be under the public schema. However, you can select which schema you want to work under instead. You can do this by using the SET search_path
command.
SET search_path = app123;
SELECT COUNT(*) FROM events;
Or you can include the schema as a prefix to the table:
SELECT COUNT(*) FROM app123.events;
2. Query directly from each project's table instead of the entire events.
The events from each of your Amplitude projects are stored in their own tables. The table name for each project is 'events###' where the ### is the project number, which you can find in the URL of the Amplitude dashboard. The union of each project's events appears in a table called 'events'. Selecting FROM events###
when possible will make your queries faster and more efficient.
3. Custom event properties and custom user properties associated with an event_type will be pulled into their own columns in the respective event_type table.
Custom user properties will be appended by a ‘u_' and custom event properties will be appended by an 'e_'.
Note: There is a limit of 400 user properties and 50 event properties that will be pulled into their own columns. Anything past the limit will still require the JSON_EXTRACT_PATH_TEXT
function.
4. Always include a date range in the WHERE
clause of your query.
Our Redshift tables do not have a primary key but are sorted by the event_time
column. Adding a date range in the WHERE
clause of your query will significantly increase query speeds. We recommend using the DATE()
function with event_time
as the input.
5. Avoid SELECT *
queries when possible.
The more columns you select, the slower your query will be. Selecting only relevant columns as opposed to all (*) columns will significantly increase query speeds and show only relevant data.