Data Handling

This page details how SignalSmith processes, stores, transmits, and protects data across the platform. Understanding the data flow helps you assess SignalSmith’s fit for your security requirements and configure it appropriately for your environment.

Warehouse-Native Architecture

The most important thing to understand about SignalSmith’s data handling is the warehouse-native architecture: computation happens in your warehouse, and customer data stays under your control.

SignalSmith connects to your data warehouse (Snowflake, BigQuery, or Databricks) with credentials you provide. When it evaluates traits, builds audiences, or resolves identities, it generates SQL queries and executes them against your warehouse. The results are used to orchestrate syncs but are not stored persistently in SignalSmith’s own database.

What SignalSmith Stores

SignalSmith’s metadata database (PostgreSQL) stores:

Configuration metadata — Model definitions, audience filter expressions, trait configurations, sync schedules, destination settings, journey canvases
Encrypted credentials — Warehouse connection strings, destination OAuth tokens, API keys
Operational state — Sync run status and summary metrics (row counts, error counts, timestamps), not the actual row-level data
Audit events — User actions, API calls, sync run results, AI operations
User accounts — Email addresses, roles, workspace memberships (authentication via Firebase/GCP Identity Platform)

What SignalSmith Does NOT Store

Customer PII (names, emails, phone numbers, addresses)
Transaction or behavioral data
Raw data from your warehouse tables
Full sync payloads (individual records being synced)

What Data Leaves the Warehouse

Data leaves your warehouse only during activation — when audience members and their mapped fields are sent to destinations. This is the core purpose of a CDP: getting the right data to the right tools.

Activation Data Flow

Warehouse  ──SQL query──▶  SignalSmith  ──API calls──▶  Destination
(your data)               (orchestrator)               (CRM, ads, email)

The data that flows through SignalSmith during activation includes:

Identifier fields — The fields used to match records in the destination (e.g., email address, external ID)
Mapped attribute fields — Only the fields you explicitly map in the sync configuration (e.g., first name, lifetime value, segment name)

Data is streamed through SignalSmith during the sync run and is not persisted after the run completes. Only summary metrics (total rows synced, errors encountered) are retained.

Controlling What Data Leaves

You have full control over what data reaches each destination:

Field mapping — Only fields you explicitly map are sent. Unmapped fields stay in the warehouse.
Destination filters — Governance policies that restrict which audiences can sync to which destinations (learn more)
Access Filters — Row-level access controls that limit which records are visible to which users and syncs (learn more)
Data minimization — Send only the fields each destination needs. Don’t send full profiles when an email address suffices.

Encryption

In Transit

All network connections use TLS 1.2 or higher:

Connection	Encryption
Browser to SignalSmith UI	TLS 1.2+ (HTTPS)
SignalSmith to your warehouse	TLS 1.2+ (enforced by warehouse provider)
SignalSmith to destinations	TLS 1.2+ (HTTPS API calls)
Internal service communication	TLS 1.2+
MCP server connections	TLS 1.2+

At Rest

Sensitive data stored in SignalSmith’s metadata database is encrypted:

Data Type	Encryption Method
Warehouse credentials	AES-256 encryption
Destination OAuth tokens	AES-256 encryption
Destination API keys	AES-256 encryption
User API keys	bcrypt hash (one-way, irreversible)

Encryption keys are managed through your deployment’s key management configuration. In cloud deployments, keys are stored in a cloud KMS (Key Management Service).

Credential Storage

Credentials are treated as the most sensitive data in SignalSmith’s metadata store:

Encrypted at rest — All credentials are encrypted with AES-256 before being written to the database
Never logged — Credentials are excluded from application logs, error messages, and stack traces
Never returned via API — API responses redact credential values. Once set, a credential can be updated but not read back.
Access controlled — Only Admin-role users can view or modify credential configurations
Rotation supported — Credentials can be updated without disrupting existing syncs (the new credential takes effect on the next run)

Warehouse Connection Credentials

For each supported warehouse:

Warehouse	Credential Type	Storage
Snowflake	Username/password or key pair	AES-256 encrypted
BigQuery	Service account JSON key	AES-256 encrypted
Databricks	Personal access token	AES-256 encrypted

Destination Credentials

Auth Method	Storage
OAuth 2.0 tokens	AES-256 encrypted, auto-refreshed
API keys	AES-256 encrypted
Basic auth	AES-256 encrypted

Audit Logging

SignalSmith maintains a comprehensive audit log of all actions performed in the platform. Every audit event records:

Field	Description
`timestamp`	When the action occurred (UTC)
`actor_email`	The user who performed the action
`actor_id`	The user’s unique identifier
`action`	The action performed (e.g., `create`, `update`, `delete`, `trigger`, `login`)
`resource_type`	The type of resource affected (e.g., `audience`, `sync`, `destination`)
`resource_id`	The unique identifier of the affected resource
`details`	Additional context about the action (parameters, changes, error messages)
`source`	How the action was initiated (`ui`, `api`, `ai_agent`, `schedule`)
`workspace_id`	The workspace context

Audited Actions

Category	Actions Logged
Authentication	Login, logout, API key creation, API key rotation
Warehouses	Create, update, delete, test connection
Models	Create, update, delete, preview
Audiences	Create, update, delete, estimate, evaluate
Syncs	Create, update, delete, trigger, run start, run complete, run error
Destinations	Create, update, delete, reconnect
AI operations	Agent messages, tool calls, guardrail triggers, approvals
Govern	Role changes, destination filter changes, access filter changes
Settings	Workspace settings changes, user invitations

Audit Log Destinations

Audit events are written to two locations:

SignalSmith internal store — Queryable via the UI (Settings > Audit Log) and API (GET /api/v1/audit-log)
Your warehouse — Written to the CDP_AUDIT.AUDIT_LOG table, where you can query them with SQL, join with other data, and apply your own retention policies

Data Retention

SignalSmith supports configurable retention for operational data:

Data Type	Default Retention	Configurable
Sync run history	90 days	Yes
Audit log (internal)	1 year	Yes
Audit log (warehouse)	Your warehouse retention policy	N/A (you control it)
Audience evaluation snapshots	90 days	Yes
Event data	30 days	Yes
AI conversation history	30 days	Yes

Retention settings are configurable per workspace in Settings > Data Retention. Data beyond the retention period is automatically purged from SignalSmith’s internal store. Data in your warehouse is governed by your own retention policies.

API Key Security

API keys provide programmatic access to SignalSmith’s REST API and MCP server.

Key Lifecycle

Creation — An Admin generates a key in Settings > API Keys. The full key is displayed once and must be copied immediately.
Storage — The key is hashed with bcrypt and stored. The original key cannot be retrieved.
Usage — Pass the key in the Authorization: Bearer header. Each request is validated against the stored hash.
Rotation — Generate a new key and revoke the old one. Active syncs and integrations should be updated to use the new key.
Revocation — Revoke a key immediately to block all requests using it.

Key Properties

Property	Description
Prefix	Keys start with `sk_live_` (production) or `sk_test_` (development)
Scope	Each key is scoped to a single workspace
Permissions	Keys inherit the permissions of the user who created them
Last used	The dashboard shows when each key was last used
Expiration	Optional expiration date can be set at creation

Compliance — GDPR, CCPA, and SOC 2 compliance support
Govern — RBAC, destination filters, and access filters
AI Guardrails — Safety controls for AI operations
API Reference — API authentication details

Overview Compliance