Architecture
An overview of how SignalSmith works, the key concepts in the platform, and how data flows from sources through to activation.
Platform Overview
SignalSmith is a warehouse-native Customer Data Platform. Instead of copying your data into a separate system, SignalSmith connects to your existing data warehouse and runs all processing — modeling, identity resolution, audience computation, and trait calculation — directly inside it.
Your data warehouse is the single source of truth. SignalSmith orchestrates the work, but the compute happens where your data already lives.
How It Works
1. Connect Your Data
SignalSmith starts with the data already in your warehouse — or helps you get it there:
- Existing warehouse tables — Point SignalSmith at tables and views you already have in Snowflake, BigQuery, or Databricks
- Loaders — Pull data from SaaS applications (Salesforce, HubSpot, Stripe, and more) directly into your warehouse on a schedule
- Events API — Collect real-time event streams from web, mobile, and server-side sources (Segment-compatible)
No data is moved out of your warehouse for processing. SignalSmith reads from your tables and writes results back to your warehouse.
2. Model and Organize
Once your data is connected, you shape it into a structured customer data layer:
- Models — SQL queries that transform raw tables into clean, structured records. Models are evaluated natively on your warehouse and produce typed result sets.
- Schema & Entity Types — Define the objects in your data universe (User, Account, Order, etc.) and the relationships between them. Entity types provide a shared vocabulary across the platform.
- Traits — Computed attributes about your customers. Traits can be aggregations (total purchases), SQL expressions (days since last login), or formulas (LTV tier). Traits are materialized in your warehouse and available for audience building.
3. Resolve Identity
Customers interact with your business across many systems, creating fragmented records. Identity resolution unifies them:
- Define identifier families — email, phone, device ID, loyalty number — that link records across entity types
- Configure merge rules and limit rules to control how aggressively records are linked
- Run resolution to produce an identity graph that maps all source records to unified clusters
- Create golden records — a single best profile for each cluster, with per-attribute survivorship strategies (most recent, most frequent, source priority, and more)
Identity resolution runs in your warehouse. The output is a set of tables (identity graph and golden records) that other features build on.
4. Build Audiences
Audiences let you segment your customers using the full richness of your data layer:
- Filter on model attributes, traits, golden record fields, and event properties
- Combine conditions with AND/OR logic using a visual builder — or write SQL directly
- Preview audience size and composition before activating
- Audiences that reference an identity graph automatically operate on unified profiles, not fragmented source records
All audience queries execute inside your warehouse. No customer data is extracted for segmentation.
5. Activate
Push your data to the tools where your teams take action:
- Audience Syncs — Send audience membership and selected attributes to destinations. Supports mirror (full refresh), additive (add only), and subtractive (remove only) modes.
- Data Syncs — Move model output to external tools on a schedule, with field mapping and sync modes (upsert, mirror, append).
- Journeys — Orchestrate multi-step, multi-channel workflows. Customers enter based on triggers (audience membership, events) and move through branches, waits, and actions.
SignalSmith connects to 50+ destinations across advertising (Google Ads, Meta, TikTok), CRM (Salesforce, HubSpot), marketing (Braze, Iterable, Klaviyo), analytics (Mixpanel, Amplitude), streaming (Kafka, Webhooks), and more.
Only the data you explicitly activate leaves your warehouse.
Governance and Security
SignalSmith provides controls to manage access and data flow:
- RBAC — Role-based access control with custom roles and permissions at the workspace level
- Destination Filters — Control which audiences and data can be synced to each destination
- Access Filters — Row-level access control so teams only see the data they should
- Consent Management — Enforce user consent preferences across event forwarding and syncs
Supported Warehouses
| Warehouse | Status |
|---|---|
| Snowflake | Fully supported |
| Google BigQuery | Fully supported |
| Databricks | Fully supported |
SignalSmith generates dialect-specific SQL for each warehouse, so models, traits, audiences, and identity resolution work consistently regardless of which warehouse you use.
Design Principles
- Warehouse-native — All compute runs in your warehouse. SignalSmith orchestrates; your warehouse executes.
- Composable — Each capability (models, traits, audiences, identity, syncs) works independently and composes with the others. Use only what you need.
- No data extraction — Customer data stays in your warehouse for all processing. Only the records you activate are sent to destinations.
- No vendor lock-in — Works with all major cloud warehouses and 50+ destination integrations.
- API-first — Every action available in the UI has a corresponding REST API endpoint for automation and integration.