SQL Traits
SQL Traits let you write custom SQL to compute any metric about your customer entities. This is the most flexible trait type — if you can express it as a SQL query, you can turn it into a trait.
How SQL Traits Work
A SQL Trait is a query that returns two columns:
- Entity key column — The identifier that maps each row back to an entity instance (e.g.,
user_id,account_id) - Trait value column — The computed value for that entity
SignalSmith executes this query against your warehouse, then materializes the results into the entity’s trait table. Each row in the result becomes a trait value for one entity instance.
Creating a SQL Trait
- Navigate to Segment > Traits and click Create Trait
- Select SQL Trait as the type
- Choose the entity type this trait applies to (e.g., User)
- Write your SQL query in the editor
- Map the entity key column — select which output column contains the entity identifier
- Map the trait value column — select which output column contains the computed value
- Set the data type (string, number, boolean, date, or timestamp)
- Configure the evaluation schedule
- Click Save
Query Requirements
Your SQL query must:
- Return at least two columns: one for the entity key and one for the trait value
- Produce one row per entity (use
GROUP BYon the entity key if aggregating) - Be valid SQL for your warehouse dialect (Snowflake, BigQuery, or Databricks)
Your SQL query can:
- Join multiple tables
- Use CTEs, window functions, subqueries, and any SQL feature your warehouse supports
- Reference warehouse-specific functions (e.g.,
PARSE_JSONin Snowflake,JSON_EXTRACTin BigQuery) - Include
WHEREclauses,CASEexpressions, and conditional logic
Examples
Customer Lifetime Value
Compute the net revenue per customer across all orders, accounting for refunds.
SELECT
o.user_id,
COALESCE(SUM(o.order_total), 0) - COALESCE(SUM(r.refund_amount), 0) AS lifetime_value
FROM orders o
LEFT JOIN refunds r ON o.order_id = r.order_id
GROUP BY o.user_id- Entity key column:
user_id - Trait value column:
lifetime_value - Data type: Number
Days Since Last Purchase
Calculate the number of days between now and each customer’s most recent order.
SELECT
user_id,
DATEDIFF('day', MAX(order_date), CURRENT_DATE()) AS days_since_last_purchase
FROM orders
GROUP BY user_id- Entity key column:
user_id - Trait value column:
days_since_last_purchase - Data type: Number
The DATEDIFF syntax varies by warehouse. The example above uses Snowflake syntax. For BigQuery, use DATE_DIFF(CURRENT_DATE(), MAX(order_date), DAY). For Databricks, use DATEDIFF(CURRENT_DATE(), MAX(order_date)).
Total Order Count
Count the total number of completed orders per customer.
SELECT
user_id,
COUNT(*) AS total_orders
FROM orders
WHERE status = 'completed'
GROUP BY user_id- Entity key column:
user_id - Trait value column:
total_orders - Data type: Number
Most Purchased Product Category
Find the product category each customer has purchased most frequently.
WITH category_counts AS (
SELECT
o.user_id,
p.category,
COUNT(*) AS purchase_count,
ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY COUNT(*) DESC) AS rn
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY o.user_id, p.category
)
SELECT
user_id,
category AS top_category
FROM category_counts
WHERE rn = 1- Entity key column:
user_id - Trait value column:
top_category - Data type: String
Average Session Duration (Minutes)
Compute the average session length from event data.
SELECT
user_id,
ROUND(AVG(DATEDIFF('second', session_start, session_end)) / 60.0, 2) AS avg_session_minutes
FROM sessions
WHERE session_end IS NOT NULL
GROUP BY user_id- Entity key column:
user_id - Trait value column:
avg_session_minutes - Data type: Number
Is Enterprise Customer (Boolean)
Flag customers who belong to accounts with more than 100 users.
SELECT
u.user_id,
CASE
WHEN a.employee_count > 100 THEN TRUE
ELSE FALSE
END AS is_enterprise
FROM users u
JOIN accounts a ON u.account_id = a.account_id- Entity key column:
user_id - Trait value column:
is_enterprise - Data type: Boolean
Entity Key Mapping
The entity key column in your query must match the identifier column defined in your schema’s entity type. For example, if your User entity type uses user_id as its primary key, your SQL trait must include user_id as one of its output columns.
If your query references a table where the identifier has a different column name, alias it:
SELECT
customer_number AS user_id, -- Alias to match entity key
SUM(amount) AS total_spent
FROM legacy_transactions
GROUP BY customer_numberTesting Your Query
Before saving, you can click Run Preview to execute the query and see sample results. The preview:
- Executes the full query against your warehouse
- Returns up to 100 sample rows
- Shows the detected column names and data types
- Validates that your entity key and value column mappings are correct
If the query fails, the error message from your warehouse is displayed in the editor to help you debug syntax or permission issues.
Warehouse Dialect Considerations
Since SQL traits execute directly against your warehouse, be mindful of dialect-specific syntax:
| Feature | Snowflake | BigQuery | Databricks |
|---|---|---|---|
| Date diff | DATEDIFF('day', start, end) | DATE_DIFF(end, start, DAY) | DATEDIFF(end, start) |
| Current timestamp | CURRENT_TIMESTAMP() | CURRENT_TIMESTAMP() | CURRENT_TIMESTAMP() |
| String concat | CONCAT(a, b) or a || b | CONCAT(a, b) | CONCAT(a, b) |
| Null handling | COALESCE(x, 0) | COALESCE(x, 0) | COALESCE(x, 0) |
| JSON access | col:field or PARSE_JSON() | JSON_EXTRACT_SCALAR() | col.field or GET_JSON_OBJECT() |
Best Practices
- Always
GROUP BYyour entity key when aggregating — the trait pipeline expects one row per entity - Use
COALESCEfor null handling — null trait values can cause unexpected behavior in audience filters - Keep queries efficient — traits run on a schedule, so expensive queries will consume warehouse credits repeatedly
- Use CTEs for readability — complex logic is easier to maintain and debug when broken into named steps
- Test with
LIMITduring development to validate results quickly before running against the full dataset - Name traits descriptively —
days_since_last_purchaseis better thanmetric_42
Next Steps
- Aggregation Traits — Visual builder for simpler metrics
- Formula Traits — Combine SQL traits with expressions
- Trait Evaluation — How traits are scheduled and computed