GuidesWarehouse-First CDP

Building a Warehouse-First CDP

This guide walks you through building a complete Customer Data Platform using SignalSmith with your data warehouse as the single source of truth. By the end, you’ll have a full pipeline from raw warehouse data to activated audiences in your marketing and sales tools.

Philosophy: Your Warehouse Is the Source of Truth

In a warehouse-first CDP, your data warehouse (Snowflake, BigQuery, or Databricks) is the central hub for all customer data. Instead of copying data into a proprietary CDP, SignalSmith generates and executes SQL directly against your warehouse. This means:

  • No data copies — Customer data stays where it is. Traits, audiences, and models are all computed as SQL queries in your warehouse.
  • SQL-native — Every computation is transparent SQL that your data team can review, version, and audit.
  • Full warehouse power — Leverage your warehouse’s compute, scale, and query engine rather than a black-box CDP engine.
  • Complete audit trail — Every query, every sync run, every audience evaluation is logged with timestamps and row counts.
  • Single source of truth — No reconciliation between “CDP data” and “warehouse data” because they’re the same thing.

Prerequisites

  • A SignalSmith workspace (create one here)
  • A data warehouse (Snowflake, BigQuery, or Databricks) with customer data tables
  • Admin or Editor role in your SignalSmith workspace

Step 1: Connect Your Warehouse as a Source

Sources give SignalSmith read access to your warehouse. SignalSmith never writes to your source tables (except for materialized trait results in a dedicated schema).

  1. Navigate to Warehouses in the left sidebar
  2. Click Add Source
  3. Select your warehouse type
  4. Enter your connection credentials:
    • Snowflake: Account, Warehouse, Database, Schema, Username, Password or Key Pair
    • BigQuery: Project ID, Dataset, Service Account JSON
    • Databricks: Host, HTTP Path, Personal Access Token, Catalog, Schema
  5. Click Test Connection — SignalSmith will execute a lightweight SELECT 1 query to verify connectivity
  6. Click Save

Best practice: Create a dedicated read-only service account for SignalSmith. Grant SELECT access only to the schemas and tables that contain customer data. Follow the principle of least privilege.

For detailed setup instructions per warehouse, see:

Step 2: Build Data Models for Customer Entities

Models are SQL queries that shape your raw warehouse tables into the data structures SignalSmith works with. Think of them as views that define “this is what a customer looks like” or “these are all the orders.”

  1. Navigate to Models in the left sidebar
  2. Click Create Model
  3. Select the source you connected in Step 1
  4. Write a SQL query that selects customer data:
-- Example: Customer master model
SELECT
  c.customer_id,
  c.email,
  c.phone,
  c.first_name,
  c.last_name,
  c.created_at,
  c.country,
  c.company_name,
  COALESCE(o.total_orders, 0) AS total_orders,
  COALESCE(o.total_revenue, 0) AS total_revenue,
  o.last_order_date
FROM customers c
LEFT JOIN (
  SELECT
    customer_id,
    COUNT(*) AS total_orders,
    SUM(order_total) AS total_revenue,
    MAX(order_date) AS last_order_date
  FROM orders
  GROUP BY customer_id
) o ON c.customer_id = o.customer_id
WHERE c.email IS NOT NULL
  1. Click Preview to see sample results and verify the data shape
  2. Configure column types — mark customer_id and email as identifiers, the rest as attributes
  3. Name the model (e.g., “Customer Master”) and click Save

Tip: Create multiple models for different entity types — one for customers, one for accounts, one for products. This keeps each model focused and composable.

Step 3: Define Your Schema with Entity Types

The schema gives SignalSmith a structured understanding of your customer data. Entity types represent the core objects in your business (User, Account, Product, etc.), and relationships define how they connect.

  1. Navigate to Schema in the left sidebar
  2. Click Create Entity Type
  3. Name it (e.g., “User”) and provide a description
  4. Map attributes from your models to this entity type:
    • Identifiers — Fields that uniquely identify an entity (e.g., customer_id, email, phone). These are used for identity resolution and destination matching.
    • Attributes — Descriptive fields (e.g., first_name, country, company_name)
  5. Click Save

If your data has multiple entity types, create them and define relationships:

User ──belongs_to──▶ Account
User ──purchased──▶ Product

Relationships enable cross-entity audience conditions like “Users whose Account has more than 100 employees.”

Step 4: Create Traits from Warehouse Data

Traits are computed attributes that transform raw data into actionable metrics. They’re evaluated as SQL against your warehouse and materialized for fast audience building.

  1. Navigate to Segment > Traits in the left sidebar
  2. Click Create Trait
  3. Choose the trait type:

SQL Trait — For custom calculations:

-- Lifetime value
SELECT
  customer_id,
  SUM(order_total) - COALESCE(SUM(refund_amount), 0) AS lifetime_value
FROM orders
LEFT JOIN refunds USING (order_id)
GROUP BY customer_id

Aggregation Trait — For common metrics (no SQL needed):

  • Function: Count
  • Source table: orders
  • Group by: customer_id
  • Result: “Total Order Count”

Formula Trait — To combine existing traits:

average_order_value = total_revenue / total_orders
is_high_value = lifetime_value > 1000
days_since_last_order = DATEDIFF('day', last_order_date, CURRENT_DATE())
  1. Set a schedule (daily is a good starting point for most traits)
  2. Click Save

Recommended starting traits:

Trait NameTypeDescription
Lifetime ValueSQLTotal spend minus refunds
Total OrdersAggregationCount of all orders
Average Order ValueFormulaLifetime value / total orders
Days Since Last PurchaseSQLDays between last order and today
Engagement ScoreSQLComposite score from login frequency, feature usage, etc.
Account SizeAggregationNumber of users per account

Step 5: Build Audiences

Audiences are segments of customers defined by conditions on traits and attributes. The filter builder lets you construct complex conditions with AND/OR logic.

  1. Navigate to Segment > Audiences in the left sidebar
  2. Click Create Audience
  3. Select the entity type (e.g., “User”)
  4. Build your conditions using the filter builder:

Example: High-Value At-Risk Customers

WHERE
  lifetime_value > 500
  AND days_since_last_purchase > 60
  AND total_orders >= 3
  1. Click Estimate to see approximate audience size
  2. Click Preview to see sample members and verify the segment makes sense
  3. Name the audience and click Save

Useful starter audiences:

AudienceConditionsUse Case
High-Value Customerslifetime_value > 500VIP programs, premium support
Churn Riskdays_since_last_purchase > 60 AND total_orders >= 2Re-engagement campaigns
New Users (7 days)created_at > 7 days agoOnboarding sequences
Big Accountsaccount_size > 50Enterprise sales outreach
Engaged but Not Purchasedengagement_score > 80 AND total_orders = 0Conversion campaigns

Step 6: Activate to Destinations

Now connect your audiences to the tools where your team takes action — CRMs, ad platforms, email tools, and more.

Set Up a Destination

  1. Navigate to Destinations in the left sidebar
  2. Click Add Destination
  3. Select the destination type (e.g., Salesforce, Google Ads, Braze)
  4. Complete the authentication flow
  5. Click Save

Create an Audience Sync

  1. Navigate to Segment > Audience Syncs
  2. Click Create Audience Sync
  3. Select the audience you built in Step 5
  4. Select the destination
  5. Choose a sync mode:
    • Mirror — Keeps the destination in perfect sync (adds new members, removes exited members). Best for most use cases.
    • Additive — Only adds new members, never removes. Good for building cumulative lists.
    • Subtractive — Only removes members who no longer qualify. Good for suppression lists.
  6. Map fields — select which traits and attributes to send alongside membership
  7. Set a schedule (e.g., every 6 hours)
  8. Click Save & Run

Monitor the Sync

  1. Go to the Sync Runs tab on the audience sync detail page
  2. Review the run summary: members added, removed, updated, and any errors
  3. Check Insights > Sync Health for ongoing monitoring across all syncs

Benefits of the Warehouse-First Approach

Data Stays in Your Warehouse

Customer data never leaves your infrastructure. SignalSmith reads data via SQL queries and writes only activation payloads (membership lists, mapped fields) to destinations. Your warehouse remains the authoritative store.

SQL-Native Transparency

Every trait, audience, and model is expressed as SQL. Your data team can review the exact queries being run, version them in your own systems, and extend them with warehouse-specific functions.

Complete Audit Trail

Every sync run, trait evaluation, and audience computation is logged with timestamps, row counts, and error details. Use Insights dashboards or query the audit tables directly in your warehouse.

No Vendor Lock-In

Because your data stays in your warehouse and all logic is expressed as SQL, there’s no proprietary data format or opaque scoring engine to migrate away from. If you stop using SignalSmith, your data is exactly where it always was.

Warehouse-Scale Performance

Audience evaluation, trait computation, and data preparation all happen on your warehouse’s compute engine. You get the full power of Snowflake, BigQuery, or Databricks — not a scaled-down CDP query engine.

Next Steps