Column Configuration
After writing a model’s SQL query and previewing the results, you configure each output column to tell SignalSmith how to use it. Column configuration defines the semantic meaning of each column — whether it identifies a record, carries data attributes, or serves as a primary key.
Column Types
Every column in a model is assigned one of two types:
Identifier
An identifier column contains values that can be used to match or look up records in a destination system. Examples include:
email— Match contacts by email addresscustomer_id— Match records by internal IDphone— Match contacts by phone numberexternal_id— Match by a destination-specific identifier
Identifier columns are used during sync to determine which records in the destination to create, update, or delete. At least one identifier column is typically required for upsert and mirror sync modes.
When multiple identifier columns are present, the sync uses the one mapped to the destination’s primary identifier field (e.g., “Email” in Salesforce, “email” in HubSpot).
Attribute
An attribute column contains data values that describe or enrich a record. Examples include:
first_name,last_name— Personal informationlifetime_value— Computed metricssignup_date— Timestampssegment,tier— Categorizationscity,country— Geographic data
Attribute columns are mapped to fields in the destination during sync. They carry the actual data being activated.
Primary Key
The primary key is the column (or combination of columns) that uniquely identifies each row in the model’s result set. The primary key is critical for:
- Deduplication — Ensuring each row represents a unique record
- Change detection — Determining which records have been added, updated, or removed between sync runs
- Upsert/mirror operations — Matching source records to destination records
Single Column Primary Key
The most common case. A single column uniquely identifies each row:
SELECT
customer_id, -- Primary key
email,
name,
lifetime_value
FROM customersConfigure customer_id as the primary key in the column configuration panel.
Composite Primary Key
When no single column is unique, combine multiple columns:
SELECT
account_id, -- Part of composite primary key
contact_id, -- Part of composite primary key
email,
role
FROM account_contactsSelect both account_id and contact_id as primary key columns. Together, they uniquely identify each row.
Choosing a Primary Key
Guidelines for selecting the right primary key:
| Scenario | Recommended Primary Key |
|---|---|
| Customer table with unique IDs | customer_id |
| User table with unique emails | email (if guaranteed unique) |
| Junction/association table | Composite key (e.g., account_id + user_id) |
| Aggregated metrics | The GROUP BY column(s) |
| Event data with deduplication | event_id or composite of user_id + event_timestamp |
Validation: SignalSmith warns if the preview results contain duplicate values in the primary key columns, as this indicates the key does not uniquely identify rows.
Data Type Detection
SignalSmith automatically detects the data type of each column from the query preview results:
| Detected Type | Description | Examples |
|---|---|---|
| Text | String/character data | "john@example.com", "New York" |
| Number | Integer or decimal values | 42, 199.99, -3.14 |
| Boolean | True/false values | true, false |
| Date | Date values without time | 2025-01-15 |
| Datetime | Timestamp values with time | 2025-01-15T10:30:00Z |
| JSON | Structured JSON data | {"key": "value"} |
If the auto-detected type is incorrect (e.g., a numeric ID is detected as a number but should be treated as text), you can override the type in the configuration panel.
Configuration Panel
The column configuration panel appears after a successful query preview. It displays a table with one row per output column:
| Column Name | Type | Primary Key | Data Type |
|---|---|---|---|
customer_id | Identifier | Yes | Number |
email | Identifier | No | Text |
first_name | Attribute | No | Text |
last_name | Attribute | No | Text |
lifetime_value | Attribute | No | Number |
signup_date | Attribute | No | Date |
Editing Column Configuration
For each column, you can:
- Toggle type — Switch between Identifier and Attribute using the dropdown
- Set primary key — Check/uncheck the primary key checkbox
- Override data type — Change the auto-detected data type if needed
Validation Rules
SignalSmith enforces the following rules:
- At least one primary key — Every model must have at least one column marked as the primary key
- Primary keys must be non-nullable — Columns used as primary keys should not contain NULL values (a warning is shown if NULLs are detected in preview)
- Unique column names — All column names must be unique (use SQL aliases to rename duplicates)
- At least one identifier — Models used in syncs should have at least one identifier column for record matching
Impact on Syncs
Column configuration directly affects how syncs operate:
Identifier Columns in Sync
When creating a sync, you map model columns to destination fields. Identifier columns are available for mapping to the destination’s identifier fields (e.g., Email, External ID). The destination uses these to determine whether to create a new record or update an existing one.
Primary Key in Sync
The primary key determines how SignalSmith tracks changes between sync runs:
- New rows — Rows with primary key values not seen in the previous run are treated as inserts
- Changed rows — Rows with existing primary key values but different attribute values are treated as updates
- Missing rows — Rows present in the previous run but absent in the current run are treated as deletes (relevant for mirror sync mode)
Attribute Columns in Sync
Attribute columns carry the data payload. During sync, each attribute column is mapped to a specific field in the destination. Unmapped attribute columns are ignored.
Example Configurations
Customer Profiles
SELECT
customer_id,
email,
phone,
first_name,
last_name,
city,
country,
lifetime_value,
segment
FROM customers| Column | Type | Primary Key |
|---|---|---|
customer_id | Identifier | Yes |
email | Identifier | No |
phone | Identifier | No |
first_name | Attribute | No |
last_name | Attribute | No |
city | Attribute | No |
country | Attribute | No |
lifetime_value | Attribute | No |
segment | Attribute | No |
Event Aggregations
SELECT
user_id,
event_type,
COUNT(*) AS event_count,
MAX(event_time) AS last_occurrence
FROM events
GROUP BY user_id, event_type| Column | Type | Primary Key |
|---|---|---|
user_id | Identifier | Yes |
event_type | Attribute | Yes |
event_count | Attribute | No |
last_occurrence | Attribute | No |
Note: Both user_id and event_type form a composite primary key since neither is unique on its own.
Account-Contact Mapping
SELECT
account_id,
contact_id,
email,
role,
is_primary
FROM account_contacts| Column | Type | Primary Key |
|---|---|---|
account_id | Identifier | Yes |
contact_id | Identifier | Yes |
email | Identifier | No |
role | Attribute | No |
is_primary | Attribute | No |
Column Sensitivity & PII Masking
Columns can also be assigned sensitivity levels that control data visibility and sync behavior. Mark columns containing PII (email, phone, SSN, etc.) as Redacted, Sync Only, or Blocked to enforce masking in previews, suppress value suggestions, and control sync access.
Sensitivity is configured on the entity config page (Schema > Model > Configure). SignalSmith can automatically detect likely PII columns based on column name patterns.
See PII Masking for full details.
Next Steps
- Create a sync using your configured model
- Map fields to destination fields
- Set up the schema to define entity relationships
- Configure PII masking for sensitive columns