Architecture

An overview of how SignalSmith works, the key concepts in the platform, and how data flows from sources through to activation.

Platform Overview

SignalSmith is a warehouse-native Customer Data Platform. Instead of copying your data into a separate system, SignalSmith connects to your existing data warehouse and runs all processing — modeling, identity resolution, audience computation, and trait calculation — directly inside it.

Your data warehouse is the single source of truth. SignalSmith orchestrates the work, but the compute happens where your data already lives.

How It Works

1. Connect Your Data

SignalSmith starts with the data already in your warehouse — or helps you get it there:

Existing warehouse tables — Point SignalSmith at tables and views you already have in Snowflake, BigQuery, or Databricks
Loaders — Pull data from SaaS applications (Salesforce, HubSpot, Stripe, and more) directly into your warehouse on a schedule
Events API — Collect real-time event streams from web, mobile, and server-side sources (Segment-compatible)

No data is moved out of your warehouse for processing. SignalSmith reads from your tables and writes results back to your warehouse.

2. Model and Organize

Once your data is connected, you shape it into a structured customer data layer:

Models — SQL queries that transform raw tables into clean, structured records. Models are evaluated natively on your warehouse and produce typed result sets.
Schema & Entity Types — Define the objects in your data universe (User, Account, Order, etc.) and the relationships between them. Entity types provide a shared vocabulary across the platform.
Traits — Computed attributes about your customers. Traits can be aggregations (total purchases), SQL expressions (days since last login), or formulas (LTV tier). Traits are materialized in your warehouse and available for audience building.

3. Resolve Identity

Customers interact with your business across many systems, creating fragmented records. Identity resolution unifies them:

Define identifier families — email, phone, device ID, loyalty number — that link records across entity types
Configure merge rules and limit rules to control how aggressively records are linked
Run resolution to produce an identity graph that maps all source records to unified clusters
Create golden records — a single best profile for each cluster, with per-attribute survivorship strategies (most recent, most frequent, source priority, and more)

Identity resolution runs in your warehouse. The output is a set of tables (identity graph and golden records) that other features build on.

4. Build Audiences

Audiences let you segment your customers using the full richness of your data layer:

Filter on model attributes, traits, golden record fields, and event properties
Combine conditions with AND/OR logic using a visual builder — or write SQL directly
Preview audience size and composition before activating
Audiences that reference an identity graph automatically operate on unified profiles, not fragmented source records

All audience queries execute inside your warehouse. No customer data is extracted for segmentation.

5. Activate

Push your data to the tools where your teams take action:

Audience Syncs — Send audience membership and selected attributes to destinations. Supports mirror (full refresh), additive (add only), and subtractive (remove only) modes.
Data Syncs — Move model output to external tools on a schedule, with field mapping and sync modes (upsert, mirror, append).
Journeys — Orchestrate multi-step, multi-channel workflows. Customers enter based on triggers (audience membership, events) and move through branches, waits, and actions.

SignalSmith connects to 50+ destinations across advertising (Google Ads, Meta, TikTok), CRM (Salesforce, HubSpot), marketing (Braze, Iterable, Klaviyo), analytics (Mixpanel, Amplitude), streaming (Kafka, Webhooks), and more.

Only the data you explicitly activate leaves your warehouse.

Governance and Security

SignalSmith provides controls to manage access and data flow:

RBAC — Role-based access control with custom roles and permissions at the workspace level
Destination Filters — Control which audiences and data can be synced to each destination
Access Filters — Row-level access control so teams only see the data they should
Consent Management — Enforce user consent preferences across event forwarding and syncs

Supported Warehouses

Warehouse	Status
Snowflake	Fully supported
Google BigQuery	Fully supported
Databricks	Fully supported

SignalSmith generates dialect-specific SQL for each warehouse, so models, traits, audiences, and identity resolution work consistently regardless of which warehouse you use.

Design Principles

Warehouse-native — All compute runs in your warehouse. SignalSmith orchestrates; your warehouse executes.
Composable — Each capability (models, traits, audiences, identity, syncs) works independently and composes with the others. Use only what you need.
No data extraction — Customer data stays in your warehouse for all processing. Only the records you activate are sent to destinations.
No vendor lock-in — Works with all major cloud warehouses and 50+ destination integrations.
API-first — Every action available in the UI has a corresponding REST API endpoint for automation and integration.

Workspace Setup Overview