Building a Warehouse-First CDP

This guide walks you through building a complete Customer Data Platform using SignalSmith with your data warehouse as the single source of truth. By the end, you’ll have a full pipeline from raw warehouse data to activated audiences in your marketing and sales tools.

Philosophy: Your Warehouse Is the Source of Truth

In a warehouse-first CDP, your data warehouse (Snowflake, BigQuery, or Databricks) is the central hub for all customer data. Instead of copying data into a proprietary CDP, SignalSmith generates and executes SQL directly against your warehouse. This means:

No data copies — Customer data stays where it is. Traits, audiences, and models are all computed as SQL queries in your warehouse.
SQL-native — Every computation is transparent SQL that your data team can review, version, and audit.
Full warehouse power — Leverage your warehouse’s compute, scale, and query engine rather than a black-box CDP engine.
Complete audit trail — Every query, every sync run, every audience evaluation is logged with timestamps and row counts.
Single source of truth — No reconciliation between “CDP data” and “warehouse data” because they’re the same thing.

Prerequisites

A SignalSmith workspace (create one here)
A data warehouse (Snowflake, BigQuery, or Databricks) with customer data tables
Admin or Editor role in your SignalSmith workspace

Step 1: Connect Your Warehouse as a Source

Sources give SignalSmith read access to your warehouse. SignalSmith never writes to your source tables (except for materialized trait results in a dedicated schema).

Navigate to Warehouses in the left sidebar
Click Add Source
Select your warehouse type
Enter your connection credentials:
- Snowflake: Account, Warehouse, Database, Schema, Username, Password or Key Pair
- BigQuery: Project ID, Dataset, Service Account JSON
- Databricks: Host, HTTP Path, Personal Access Token, Catalog, Schema
Click Test Connection — SignalSmith will execute a lightweight SELECT 1 query to verify connectivity
Click Save

Best practice: Create a dedicated read-only service account for SignalSmith. Grant SELECT access only to the schemas and tables that contain customer data. Follow the principle of least privilege.

For detailed setup instructions per warehouse, see:

Step 2: Build Data Models for Customer Entities

Models are SQL queries that shape your raw warehouse tables into the data structures SignalSmith works with. Think of them as views that define “this is what a customer looks like” or “these are all the orders.”

Navigate to Models in the left sidebar
Click Create Model
Select the source you connected in Step 1
Write a SQL query that selects customer data:

-- Example: Customer master model
SELECT
  c.customer_id,
  c.email,
  c.phone,
  c.first_name,
  c.last_name,
  c.created_at,
  c.country,
  c.company_name,
  COALESCE(o.total_orders, 0) AS total_orders,
  COALESCE(o.total_revenue, 0) AS total_revenue,
  o.last_order_date
FROM customers c
LEFT JOIN (
  SELECT
    customer_id,
    COUNT(*) AS total_orders,
    SUM(order_total) AS total_revenue,
    MAX(order_date) AS last_order_date
  FROM orders
  GROUP BY customer_id
) o ON c.customer_id = o.customer_id
WHERE c.email IS NOT NULL

Click Preview to see sample results and verify the data shape
Configure column types — mark customer_id and email as identifiers, the rest as attributes
Name the model (e.g., “Customer Master”) and click Save

Tip: Create multiple models for different entity types — one for customers, one for accounts, one for products. This keeps each model focused and composable.

Step 3: Define Your Schema with Entity Types

The schema gives SignalSmith a structured understanding of your customer data. Entity types represent the core objects in your business (User, Account, Product, etc.), and relationships define how they connect.

Navigate to Schema in the left sidebar
Click Create Entity Type
Name it (e.g., “User”) and provide a description
Map attributes from your models to this entity type:
- Identifiers — Fields that uniquely identify an entity (e.g., customer_id, email, phone). These are used for identity resolution and destination matching.
- Attributes — Descriptive fields (e.g., first_name, country, company_name)
Click Save

If your data has multiple entity types, create them and define relationships:

User ──belongs_to──▶ Account
User ──purchased──▶ Product

Relationships enable cross-entity audience conditions like “Users whose Account has more than 100 employees.”

Step 4: Create Traits from Warehouse Data

Traits are computed attributes that transform raw data into actionable metrics. They’re evaluated as SQL against your warehouse and materialized for fast audience building.

Navigate to Segment > Traits in the left sidebar
Click Create Trait
Choose the trait type:

SQL Trait — For custom calculations:

-- Lifetime value
SELECT
  customer_id,
  SUM(order_total) - COALESCE(SUM(refund_amount), 0) AS lifetime_value
FROM orders
LEFT JOIN refunds USING (order_id)
GROUP BY customer_id

Aggregation Trait — For common metrics (no SQL needed):

Function: Count
Source table: orders
Group by: customer_id
Result: “Total Order Count”

Formula Trait — To combine existing traits:

average_order_value = total_revenue / total_orders
is_high_value = lifetime_value > 1000
days_since_last_order = DATEDIFF('day', last_order_date, CURRENT_DATE())

Set a schedule (daily is a good starting point for most traits)
Click Save

Recommended starting traits:

Trait Name	Type	Description
Lifetime Value	SQL	Total spend minus refunds
Total Orders	Aggregation	Count of all orders
Average Order Value	Formula	Lifetime value / total orders
Days Since Last Purchase	SQL	Days between last order and today
Engagement Score	SQL	Composite score from login frequency, feature usage, etc.
Account Size	Aggregation	Number of users per account

Step 5: Build Audiences

Audiences are segments of customers defined by conditions on traits and attributes. The filter builder lets you construct complex conditions with AND/OR logic.

Navigate to Segment > Audiences in the left sidebar
Click Create Audience
Select the entity type (e.g., “User”)
Build your conditions using the filter builder:

Example: High-Value At-Risk Customers

WHERE
  lifetime_value > 500
  AND days_since_last_purchase > 60
  AND total_orders >= 3

Click Estimate to see approximate audience size
Click Preview to see sample members and verify the segment makes sense
Name the audience and click Save

Useful starter audiences:

Audience	Conditions	Use Case
High-Value Customers	`lifetime_value > 500`	VIP programs, premium support
Churn Risk	`days_since_last_purchase > 60 AND total_orders >= 2`	Re-engagement campaigns
New Users (7 days)	`created_at > 7 days ago`	Onboarding sequences
Big Accounts	`account_size > 50`	Enterprise sales outreach
Engaged but Not Purchased	`engagement_score > 80 AND total_orders = 0`	Conversion campaigns

Step 6: Activate to Destinations

Now connect your audiences to the tools where your team takes action — CRMs, ad platforms, email tools, and more.

Set Up a Destination

Navigate to Destinations in the left sidebar
Click Add Destination
Select the destination type (e.g., Salesforce, Google Ads, Braze)
Complete the authentication flow
Click Save

Create an Audience Sync

Navigate to Segment > Audience Syncs
Click Create Audience Sync
Select the audience you built in Step 5
Select the destination
Choose a sync mode:
- Mirror — Keeps the destination in perfect sync (adds new members, removes exited members). Best for most use cases.
- Additive — Only adds new members, never removes. Good for building cumulative lists.
- Subtractive — Only removes members who no longer qualify. Good for suppression lists.
Map fields — select which traits and attributes to send alongside membership
Set a schedule (e.g., every 6 hours)
Click Save & Run

Monitor the Sync

Go to the Sync Runs tab on the audience sync detail page
Review the run summary: members added, removed, updated, and any errors
Check Insights > Sync Health for ongoing monitoring across all syncs

Benefits of the Warehouse-First Approach

Data Stays in Your Warehouse

Customer data never leaves your infrastructure. SignalSmith reads data via SQL queries and writes only activation payloads (membership lists, mapped fields) to destinations. Your warehouse remains the authoritative store.

SQL-Native Transparency

Every trait, audience, and model is expressed as SQL. Your data team can review the exact queries being run, version them in your own systems, and extend them with warehouse-specific functions.

Complete Audit Trail

Every sync run, trait evaluation, and audience computation is logged with timestamps, row counts, and error details. Use Insights dashboards or query the audit tables directly in your warehouse.

No Vendor Lock-In

Because your data stays in your warehouse and all logic is expressed as SQL, there’s no proprietary data format or opaque scoring engine to migrate away from. If you stop using SignalSmith, your data is exactly where it always was.

Warehouse-Scale Performance

Audience evaluation, trait computation, and data preparation all happen on your warehouse’s compute engine. You get the full power of Snowflake, BigQuery, or Databricks — not a scaled-down CDP query engine.

Next Steps

Sync audiences to your CRM — Detailed guide for Salesforce and HubSpot activation
Build retargeting audiences — Create audiences for ad platforms
Set up identity resolution — Unify customer records across data sources
Explore all destinations — See the full catalog of 50+ supported destinations

Overview Sync CRM Audiences