WarehousesOverview

Warehouses

Warehouses are connections to the data warehouses where your customer data lives. In SignalSmith’s composable CDP architecture, your warehouse remains the source of truth — Warehouses simply provide SignalSmith with read access to query your data.

What Is a Warehouse?

A Warehouse is a configured connection to a data warehouse. Once connected, SignalSmith can execute SQL queries against it to power Models, which in turn feed Syncs and the rest of the platform.

Warehouses are read-only connections. SignalSmith never writes to, modifies, or deletes data in your warehouse. The only operations performed are SELECT queries defined by your models.

Supported Warehouses

WarehouseAuthenticationNotes
SnowflakeUsername/Password, Key PairMost popular choice for enterprise CDPs
BigQueryService Account JSONNative GCP integration
DatabricksPersonal Access TokenUnity Catalog and Hive Metastore support

Warehouse Lifecycle

Every warehouse in SignalSmith follows a well-defined lifecycle:

1. Configuration

You provide connection credentials specific to your warehouse type. Each warehouse has its own set of required fields — account identifiers, authentication credentials, and schema/database targeting.

2. Connection Testing

Before a warehouse is saved, SignalSmith performs a connection test to verify that the credentials are valid and the warehouse is reachable. This test executes a lightweight query (typically SELECT 1) to confirm connectivity, authentication, and authorization.

3. Active Use

Once saved, the warehouse becomes available for use across the platform:

  • Models can select it as their data warehouse and run SQL queries against it
  • Schema can discover tables and columns from it for entity mapping
  • Loaders can write ingested data into it

4. Monitoring

Active warehouses are periodically health-checked. If a connection fails (e.g., credentials are rotated, IP allowlisting changes, or the warehouse is unreachable), the warehouse status changes to Unhealthy and affected syncs are paused until the issue is resolved.

5. Decommissioning

When a warehouse is no longer needed, you can delete it — provided no active models or syncs depend on it. SignalSmith will warn you if downstream resources would be affected.

How Warehouses Fit into the Data Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│              │     │              │     │              │     │              │
│  Warehouse   │────▶│    Model     │────▶│    Sync      │────▶│ Destination  │
│  (Connection)│     │  (SQL Query) │     │  (Schedule)  │     │  (CRM, Ads)  │
│              │     │              │     │              │     │              │
└──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘
  1. A Warehouse connects to your data warehouse
  2. A Model runs a SQL query against that warehouse
  3. A Sync moves the model results to a destination on a schedule
  4. A Destination receives the data in the format it expects

API Reference

Warehouses are managed through the SignalSmith REST API:

# List all warehouses
GET /api/v1/sources
 
# Get a single warehouse
GET /api/v1/sources/{id}
 
# Create a warehouse
POST /api/v1/sources
 
# Update a warehouse
PUT /api/v1/sources/{id}
 
# Delete a warehouse
DELETE /api/v1/sources/{id}
 
# Test a warehouse connection
POST /api/v1/sources/{id}/test

See the API Reference for full request/response schemas.

Best Practices

  • Use a dedicated service account — Create a read-only database user or service account specifically for SignalSmith. Avoid using personal credentials or admin accounts.
  • Restrict permissions — Grant only SELECT access to the specific schemas and tables SignalSmith needs. Follow the principle of least privilege.
  • Allowlist SignalSmith IPs — If your warehouse has network restrictions, ensure SignalSmith’s IP addresses are allowlisted. Contact support for the current IP list.
  • Name warehouses descriptively — Use names like “Production Snowflake” or “Analytics BigQuery” so team members can quickly identify each connection.
  • Test before saving — Always run the connection test before saving. A successful test confirms that credentials, network access, and permissions are all correctly configured.

Next Steps