Running Resolution

After configuring your identity graph, you can run identity resolution to link customer records and produce golden records. SignalSmith supports two resolution modes: full resolution (recompute from scratch) and incremental resolution (process only changes since the last run).

Full Resolution

Full resolution reprocesses all records from all entity types, rebuilds the entire identity graph, and recomputes all clusters and golden records from scratch.

When to Use Full Resolution

First run — Always run full resolution the first time after creating an identity graph
After configuration changes — When you modify identifier families, merge rules, or limit rules
After major data changes — When a large data migration or backfill significantly changes the source data
Periodic refresh — As a scheduled full recomputation to correct any drift from incremental resolution

How It Works

Extract identifiers — Query all source tables to extract entity keys and identifier values
Normalize — Apply normalization rules to all identifier values (lowercase emails, strip phone formatting, etc.)
Build edges — For each pair of records sharing a normalized identifier value, create a candidate edge
Apply merge rules — Filter edges by merge rule conditions (single identifier match, multi-identifier match)
Enforce limit rules — Drop edges that would cause clusters to exceed configured limits
Find connected components — Run the graph algorithm to identify clusters of linked records
Produce golden records — Apply survivorship rules to each cluster to create unified profiles
Materialize — Write cluster assignments and golden records to the warehouse

Triggering Full Resolution

From the UI:

Navigate to Identity Resolution and select your identity graph
Click Run Resolution
Select Full Resolution
Click Start

From the API:

POST /api/v1/identity-graphs/{id}/resolve
Content-Type: application/json
 
{
  "mode": "full"
}

Performance

Full resolution performance depends on:

Factor	Impact
Total record count	More records = more edges to evaluate
Number of identifier families	More families = more edge candidates
Merge rule complexity	Multi-identifier rules require more computation
Warehouse compute	Larger warehouse instances process faster

Typical performance benchmarks:

Record Count	Approximate Duration
100,000	2-5 minutes
1,000,000	10-30 minutes
10,000,000	1-3 hours
100,000,000	4-12 hours

These are approximate benchmarks. Actual performance varies based on warehouse type, compute size, data distribution, and merge rule complexity.

Incremental Resolution

Incremental resolution processes only records that have been added, modified, or deleted since the last resolution run. This is significantly faster than full resolution for ongoing operations.

When to Use Incremental Resolution

Regular scheduled runs — After the initial full resolution, use incremental for daily/hourly updates
Real-time data ingestion — When new records arrive continuously from loaders or event pipelines
Cost optimization — Incremental runs consume far less warehouse compute than full runs

How It Works

Detect changes — Identify records added, modified, or deleted since the last run (using change tracking or timestamp comparison)
Extract new identifiers — Query only the changed records for their identifier values
Evaluate edges — Create candidate edges between new/modified records and existing records
Apply merge and limit rules — Same rules as full resolution, applied only to new edges
Update clusters — Merge new records into existing clusters or create new clusters
Update golden records — Recompute golden records for affected clusters only
Materialize — Write updated cluster assignments and golden records to the warehouse

Change Detection

SignalSmith detects changes using one of two methods:

Method	Description	Requirements
Timestamp-based	Compares record timestamps to the last run timestamp	Source tables must have a `created_at` or `updated_at` column
Change tracking	Uses warehouse-native change tracking	Snowflake: Stream; BigQuery: Change history; Databricks: Delta Lake change data feed

If your warehouse supports change tracking, it is preferred because it captures deletes and is more efficient.

Triggering Incremental Resolution

From the UI:

Navigate to Identity Resolution and select your identity graph
Click Run Resolution
Select Incremental Resolution
Click Start

From the API:

POST /api/v1/identity-graphs/{id}/resolve
Content-Type: application/json
 
{
  "mode": "incremental"
}

Limitations of Incremental Resolution

Drift accumulation — Over many incremental runs, small inaccuracies can accumulate. Schedule periodic full runs to correct this.
Deletes may be missed — If using timestamp-based change detection, deleted records are not detected. Use change tracking or periodic full runs.
Configuration changes — If you modify merge rules, limit rules, or identifier families, you must run full resolution for the changes to take effect across all records.

Scheduling

Identity resolution can be scheduled to run automatically:

Schedule	Mode	Use Case
Manual	Either	Run on demand after configuration changes
Hourly	Incremental	Near-real-time identity unification
Daily	Incremental	Standard cadence for most use cases
Weekly	Full	Periodic full refresh to correct drift
Custom Cron	Either	Specific timing needs

Recommended Schedule Pattern

A common pattern is to combine both modes:

Daily at 2 AM: Incremental resolution (process new/changed records)
Weekly on Sunday at midnight: Full resolution (recompute everything to correct drift)

This provides daily freshness with weekly accuracy correction.

Monitoring

Resolution Run Status

Each resolution run has a status:

Status	Description
Queued	The run is scheduled but has not started
Running	The resolution is in progress
Completed	The resolution finished successfully
Failed	The resolution encountered an error
Cancelled	The run was manually cancelled

Run Metrics

After a successful run, the following metrics are available:

Metric	Description
Records processed	Total number of source records included in resolution
Clusters formed	Number of distinct identity clusters
Merges performed	Number of records linked to a cluster with more than one record
Singletons	Number of clusters with only one record (unmatched)
Edges created	Total number of edges in the identity graph
Edges dropped (limits)	Number of edges dropped due to limit rules
Duration	Total wall-clock time for the run
Golden records produced	Number of unified profiles created

Cluster Size Distribution

The run results include a cluster size distribution showing how many clusters contain N records:

Cluster Size	Count	Percentage
1 (singleton)	450,000	45%
2	200,000	20%
3-5	150,000	15%
6-10	80,000	8%
11-50	50,000	5%
51-100	10,000	1%
100+	0	0% (capped by limit rules)

A healthy distribution typically shows a long tail: many singletons, a significant number of 2-record clusters, and decreasing counts for larger clusters.

⚠️

If you see many clusters at or near your max cluster size limit, it may indicate over-merging. Review sample clusters at the limit boundary to check for false merges.

Error Handling

When a resolution run fails, SignalSmith provides:

Error message — The specific error that caused the failure
Error phase — Which phase failed (extraction, edge building, clustering, golden records, materialization)
Partial results — Whether any intermediate results were produced before the failure

Common failure causes:

Cause	Resolution
Warehouse timeout	Increase warehouse timeout settings or scale up compute
Permission denied	Check source connection credentials
Out of memory	Reduce the number of concurrent edges being processed
Schema change	A source table was altered. Update identifier family column mappings.

Next Steps

Golden Records — Configure survivorship rules for unified profiles
Profile Explorer — Search and view resolved profiles
Creating an Identity Graph — Configure or reconfigure your identity graph

Limit Rules Golden Records