Models

Models are SQL queries that define the shape and content of the data you want to sync from your warehouse to your destinations. They are the core abstraction in SignalSmith’s data pipeline — connecting your raw warehouse data to the activation layer.

What Is a Model?

A Model is a saved SQL query paired with column configuration. It runs against a Warehouse (your data warehouse) and produces a result set that can be consumed by Syncs, Schema mappings, and Segment.

Think of models as views into your warehouse data that are specifically designed for activation. Unlike raw tables, models let you:

Join and transform data from multiple tables into a single, clean result set
Filter rows to include only relevant records (e.g., active customers, recent orders)
Rename and reshape columns to match destination requirements
Tag columns with semantic types (identifier, attribute) so SignalSmith knows how to use them

Why Models Matter

Models are the bridge between your warehouse and every downstream system. Without them, you would need to:

Write custom integrations for each destination
Manage schema mapping in every sync
Duplicate transformation logic across pipelines

With models, you define the transformation once and reuse it across any number of syncs and destinations.

┌──────────────────────────────────────────────────────────────┐
│                         Model                                │
│  ┌──────────────────────────────────────────────────────┐    │
│  │  SELECT                                              │    │
│  │    u.id, u.email, u.name,                            │    │
│  │    SUM(o.amount) as lifetime_value                   │    │
│  │  FROM users u                                        │    │
│  │  JOIN orders o ON u.id = o.user_id                   │    │
│  │  GROUP BY u.id, u.email, u.name                      │    │
│  └──────────────────────────────────────────────────────┘    │
│                                                              │
│  Column Config:                                              │
│  ├── id          → identifier (primary key)                  │
│  ├── email       → identifier                               │
│  ├── name        → attribute                                 │
│  └── lifetime_value → attribute                              │
└──────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
   ┌──────────┐       ┌──────────────┐     ┌──────────────┐
   │ Salesforce│       │  Google Ads  │     │   HubSpot    │
   │   Sync    │       │    Sync      │     │    Sync      │
   └──────────┘       └──────────────┘     └──────────────┘

Model Components

Every model consists of three parts:

1. Source Selection

Each model is connected to exactly one Warehouse. The SQL query runs against the warehouse configured in that source. You select the source when creating the model.

2. SQL Query

The heart of the model is a SQL query written in the dialect of your source warehouse (Snowflake SQL, BigQuery SQL, etc.). The query can include:

SELECT statements with any columns, expressions, or aggregations
JOIN across multiple tables (even across schemas, if permitted)
WHERE filters to limit rows
GROUP BY, HAVING, ORDER BY, and LIMIT clauses
CTEs (WITH clauses) for complex transformations
Window functions for ranking, running totals, etc.

See the SQL Editor guide for details on the editing experience.

3. Column Configuration

After writing and previewing your SQL, you configure each output column:

Column type — Is it an identifier (used to match records) or an attribute (data to sync)?
Primary key — Which column(s) uniquely identify each row?
Data type — What type of data does the column contain?

See Column Configuration for the full guide.

Model Lifecycle

1. Create

Write a SQL query, preview the results, and configure columns. The model is saved but not yet actively syncing data.

2. Use in Syncs

Connect the model to one or more Syncs. Each sync maps the model’s columns to a destination’s fields and runs on a schedule.

3. Use in Schema

Map the model’s columns to entity types and relationships in the Schema module to build a unified data model.

4. Edit

Update the SQL query or column configuration as your data needs evolve. Changes take effect on the next sync run.

5. Archive

When a model is no longer needed, archive or delete it. SignalSmith prevents deletion if active syncs depend on the model.

API Reference

Models are managed through the SignalSmith REST API:

# List all models
GET /api/v1/models
 
# Get a single model
GET /api/v1/models/{id}
 
# Create a model
POST /api/v1/models
 
# Update a model
PUT /api/v1/models/{id}
 
# Delete a model
DELETE /api/v1/models/{id}
 
# Preview model results
POST /api/v1/models/{id}/preview

Example: Create a Model

curl -X POST https://your-workspace.signalsmith.dev/api/v1/models \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Active Customers",
    "source_id": "src_abc123",
    "query": "SELECT id, email, name, lifetime_value FROM customers WHERE status = '\''active'\''",
    "columns": [
      {"name": "id", "type": "identifier", "primary_key": true},
      {"name": "email", "type": "identifier"},
      {"name": "name", "type": "attribute"},
      {"name": "lifetime_value", "type": "attribute"}
    ]
  }'

Best Practices

Keep queries focused — Each model should serve a specific purpose (e.g., “Active Customers for CRM Sync” rather than “All Customer Data”)
Use CTEs for readability — Break complex queries into named steps with WITH clauses
Always include a primary key — Every model needs at least one column marked as the primary key for syncs to function correctly
Preview before saving — Use the preview feature to verify your query returns the expected data
Name models descriptively — Use names that describe both the data and the intended use case
Avoid SELECT * — Explicitly list the columns you need to avoid syncing unnecessary data and to maintain stability when the source table schema changes
Filter early — Apply WHERE clauses to reduce the data scanned and improve query performance

Next Steps

ERD Visualization Creating a Model