Models
Models are SQL queries that define the shape and content of the data you want to sync from your warehouse to your destinations. They are the core abstraction in SignalSmith’s data pipeline — connecting your raw warehouse data to the activation layer.
What Is a Model?
A Model is a saved SQL query paired with column configuration. It runs against a Warehouse (your data warehouse) and produces a result set that can be consumed by Syncs, Schema mappings, and Segment.
Think of models as views into your warehouse data that are specifically designed for activation. Unlike raw tables, models let you:
- Join and transform data from multiple tables into a single, clean result set
- Filter rows to include only relevant records (e.g., active customers, recent orders)
- Rename and reshape columns to match destination requirements
- Tag columns with semantic types (identifier, attribute) so SignalSmith knows how to use them
Why Models Matter
Models are the bridge between your warehouse and every downstream system. Without them, you would need to:
- Write custom integrations for each destination
- Manage schema mapping in every sync
- Duplicate transformation logic across pipelines
With models, you define the transformation once and reuse it across any number of syncs and destinations.
┌──────────────────────────────────────────────────────────────┐
│ Model │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ SELECT │ │
│ │ u.id, u.email, u.name, │ │
│ │ SUM(o.amount) as lifetime_value │ │
│ │ FROM users u │ │
│ │ JOIN orders o ON u.id = o.user_id │ │
│ │ GROUP BY u.id, u.email, u.name │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Column Config: │
│ ├── id → identifier (primary key) │
│ ├── email → identifier │
│ ├── name → attribute │
│ └── lifetime_value → attribute │
└──────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ Salesforce│ │ Google Ads │ │ HubSpot │
│ Sync │ │ Sync │ │ Sync │
└──────────┘ └──────────────┘ └──────────────┘Model Components
Every model consists of three parts:
1. Source Selection
Each model is connected to exactly one Warehouse. The SQL query runs against the warehouse configured in that source. You select the source when creating the model.
2. SQL Query
The heart of the model is a SQL query written in the dialect of your source warehouse (Snowflake SQL, BigQuery SQL, etc.). The query can include:
SELECTstatements with any columns, expressions, or aggregationsJOINacross multiple tables (even across schemas, if permitted)WHEREfilters to limit rowsGROUP BY,HAVING,ORDER BY, andLIMITclauses- CTEs (
WITHclauses) for complex transformations - Window functions for ranking, running totals, etc.
See the SQL Editor guide for details on the editing experience.
3. Column Configuration
After writing and previewing your SQL, you configure each output column:
- Column type — Is it an identifier (used to match records) or an attribute (data to sync)?
- Primary key — Which column(s) uniquely identify each row?
- Data type — What type of data does the column contain?
See Column Configuration for the full guide.
Model Lifecycle
1. Create
Write a SQL query, preview the results, and configure columns. The model is saved but not yet actively syncing data.
2. Use in Syncs
Connect the model to one or more Syncs. Each sync maps the model’s columns to a destination’s fields and runs on a schedule.
3. Use in Schema
Map the model’s columns to entity types and relationships in the Schema module to build a unified data model.
4. Edit
Update the SQL query or column configuration as your data needs evolve. Changes take effect on the next sync run.
5. Archive
When a model is no longer needed, archive or delete it. SignalSmith prevents deletion if active syncs depend on the model.
API Reference
Models are managed through the SignalSmith REST API:
# List all models
GET /api/v1/models
# Get a single model
GET /api/v1/models/{id}
# Create a model
POST /api/v1/models
# Update a model
PUT /api/v1/models/{id}
# Delete a model
DELETE /api/v1/models/{id}
# Preview model results
POST /api/v1/models/{id}/previewExample: Create a Model
curl -X POST https://your-workspace.signalsmith.dev/api/v1/models \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Active Customers",
"source_id": "src_abc123",
"query": "SELECT id, email, name, lifetime_value FROM customers WHERE status = '\''active'\''",
"columns": [
{"name": "id", "type": "identifier", "primary_key": true},
{"name": "email", "type": "identifier"},
{"name": "name", "type": "attribute"},
{"name": "lifetime_value", "type": "attribute"}
]
}'Best Practices
- Keep queries focused — Each model should serve a specific purpose (e.g., “Active Customers for CRM Sync” rather than “All Customer Data”)
- Use CTEs for readability — Break complex queries into named steps with
WITHclauses - Always include a primary key — Every model needs at least one column marked as the primary key for syncs to function correctly
- Preview before saving — Use the preview feature to verify your query returns the expected data
- Name models descriptively — Use names that describe both the data and the intended use case
- Avoid
SELECT *— Explicitly list the columns you need to avoid syncing unnecessary data and to maintain stability when the source table schema changes - Filter early — Apply
WHEREclauses to reduce the data scanned and improve query performance