A very common query is selecting users who satisfy some property intrinsic to them - their country, language, device platform (iOS or Android), the ad network that directed them to the product, etc.
Amplitude tracks all of this data, so finding the users who satisfy user properties is a simple query on Redshift. There are two primary types of user properties: properties tracked automatically by Amplitude and custom user properties. Each requires different query syntax, which we will go over below. The values in red are what you need to replace with your own.
NOTE: For most customers, the Help Center article on exporting Amplitude data to Redshift will be more useful.
Properties Tracked Automatically by Amplitude
These properties are stored for every event in their own Redshift column and include:
- version: The version of your product being used (e.g. 3.4.2).
- country: The country as set on the user's device
- city: The city of the user.
- region: The region of the user (states within the United States and province in other countries).
- DMA: Designated marketing area, a marketing area that shares media.
- Language: Language as set on the user's device.
- Platform: Operating system type, e.g. Android, iOS, Chrome, etc.
- OS: Version number of the operating system, e.g. Android 4.4.2.
- Device family: E.g. Samsung, Casio, Kyocera, Acer.
- Device Type: E.g. iPhone 6, Galaxy.
- Carrier: E.g. Verizon, Vodafone, AT&T.
Query Objective: Obtain the Most Common Values for a Given User Property
It is often useful to first look at the most common values for a given user property. For example, perhaps we are interested in knowing the countries in which our product has the most users. To do this, use the following query.
SELECT country, count(DISTINCT amplitude_id) AS count FROM events123 WHERE DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06' GROUP BY country ORDER BY count DESC;
This will return a table with the country name as the first column and the number of distinct users from that country as the second column; the 'ORDER BY count DESC' option at the end will list the countries from the highest number of users to lowest.
Query Objective: Obtain a List of Users Who Have Certain Properties
Once we have a sense of the relevant property values, we can then query for the list of users who have certain user properties. For example, if we are interested in getting a list of active users between October 19th and October 26th who are from either Canada or the United Kingdom, we can perform the following query:
SELECT DISTINCT user_id, country, platform FROM events123 WHERE (country = 'Canada' OR country = 'United Kingdom') WHERE DATE(event_time) BETWEEN '2017-10-19' AND '2017-10-26';
Here we return all of the relevant columns (user_id, country, platform) so we can see the corresponding property values for each user satisfying the query; however, you can choose to just return the user_id or you can ask to return other column values that are not part of the WHERE clause.
Custom User Properties
In addition to the user properties automatically tracked by Amplitude, your project can specify additional user properties. User properties are pulled into their own columns in each event table. There is a limit of 400 user properties that can be put into their own columns. All other properties are saved in JSON format in a single column in Redshift called user_properties. Possible examples include the advertising network the user was referred from, the number of photos the user has saved in the product, the amount of in-game currency the user has, etc.
Conceptually, these are very similar to the default Amplitude-tracked user properties discussed above; they track one aspect of the current state of a user and they are not event specific (so the same user properties and values appear on all events for a user at a point in time).
Query Objective: Obtain the Most Common Advertising Referral Networks for Users
As an example, say we want to see the most common advertising referral networks for users and we have stored this value in the user_properties column under the key 'Referral'. Then the query is:
SELECT JSON_EXTRACT_PATH_TEXT(user_properties,'Referral') AS Referral_Type, count(DISTINCT amplitude_id) as count FROM events123 WHERE DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06' GROUP BY Referral_Type ORDER BY count DESC;
This will return a table with the referral network name as the first column (which we have chosen to call Referral_Type but you can name anything you want) and the number of associated distinct users as the second column; the 'ORDER BY count DESC' option at the end will list the referral network names in descending order from highest number of users to lowest.
Numerical Custom User Properties
If the user property you are interested in has numerical values instead of text, then you can query for ranges of values.
Query Objective: Obtain the Number of Users Whose Current Level Is Greater Than 7 but Less Than 10
For example, below we query for the number of users whose Current Level in our game app is greater than 7 but less than 10 (e.g. Level 8 or 9):
SELECT DISTINCT count(amplitude_id) FROM events123 WHERE NULLIF(JSON_EXTRACT_PATH_TEXT(user_properties, 'Current Level'), '')::int > 7 AND NULLIF(JSON_EXTRACT_PATH_TEXT(user_properties, 'Current Level'), '')::int < 10 AND DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06';
Be sure to use the syntax above - specifically the NULLIF() function (which converts empty strings '' to the special SQL value NULL) and the ::int (casts strings to integers). This is necessary for numerical property values since the json_extract_path_text() function returns strings.