Running Resolution
After configuring your identity graph, you can run identity resolution to link customer records and produce golden records. SignalSmith supports two resolution modes: full resolution (recompute from scratch) and incremental resolution (process only changes since the last run).
Full Resolution
Full resolution reprocesses all records from all entity types, rebuilds the entire identity graph, and recomputes all clusters and golden records from scratch.
When to Use Full Resolution
- First run — Always run full resolution the first time after creating an identity graph
- After configuration changes — When you modify identifier families, merge rules, or limit rules
- After major data changes — When a large data migration or backfill significantly changes the source data
- Periodic refresh — As a scheduled full recomputation to correct any drift from incremental resolution
How It Works
- Extract identifiers — Query all source tables to extract entity keys and identifier values
- Normalize — Apply normalization rules to all identifier values (lowercase emails, strip phone formatting, etc.)
- Build edges — For each pair of records sharing a normalized identifier value, create a candidate edge
- Apply merge rules — Filter edges by merge rule conditions (single identifier match, multi-identifier match)
- Enforce limit rules — Drop edges that would cause clusters to exceed configured limits
- Find connected components — Run the graph algorithm to identify clusters of linked records
- Produce golden records — Apply survivorship rules to each cluster to create unified profiles
- Materialize — Write cluster assignments and golden records to the warehouse
Triggering Full Resolution
From the UI:
- Navigate to Identity Resolution and select your identity graph
- Click Run Resolution
- Select Full Resolution
- Click Start
From the API:
POST /api/v1/identity-graphs/{id}/resolve
Content-Type: application/json
{
"mode": "full"
}Performance
Full resolution performance depends on:
| Factor | Impact |
|---|---|
| Total record count | More records = more edges to evaluate |
| Number of identifier families | More families = more edge candidates |
| Merge rule complexity | Multi-identifier rules require more computation |
| Warehouse compute | Larger warehouse instances process faster |
Typical performance benchmarks:
| Record Count | Approximate Duration |
|---|---|
| 100,000 | 2-5 minutes |
| 1,000,000 | 10-30 minutes |
| 10,000,000 | 1-3 hours |
| 100,000,000 | 4-12 hours |
These are approximate benchmarks. Actual performance varies based on warehouse type, compute size, data distribution, and merge rule complexity.
Incremental Resolution
Incremental resolution processes only records that have been added, modified, or deleted since the last resolution run. This is significantly faster than full resolution for ongoing operations.
When to Use Incremental Resolution
- Regular scheduled runs — After the initial full resolution, use incremental for daily/hourly updates
- Real-time data ingestion — When new records arrive continuously from loaders or event pipelines
- Cost optimization — Incremental runs consume far less warehouse compute than full runs
How It Works
- Detect changes — Identify records added, modified, or deleted since the last run (using change tracking or timestamp comparison)
- Extract new identifiers — Query only the changed records for their identifier values
- Evaluate edges — Create candidate edges between new/modified records and existing records
- Apply merge and limit rules — Same rules as full resolution, applied only to new edges
- Update clusters — Merge new records into existing clusters or create new clusters
- Update golden records — Recompute golden records for affected clusters only
- Materialize — Write updated cluster assignments and golden records to the warehouse
Change Detection
SignalSmith detects changes using one of two methods:
| Method | Description | Requirements |
|---|---|---|
| Timestamp-based | Compares record timestamps to the last run timestamp | Source tables must have a created_at or updated_at column |
| Change tracking | Uses warehouse-native change tracking | Snowflake: Stream; BigQuery: Change history; Databricks: Delta Lake change data feed |
If your warehouse supports change tracking, it is preferred because it captures deletes and is more efficient.
Triggering Incremental Resolution
From the UI:
- Navigate to Identity Resolution and select your identity graph
- Click Run Resolution
- Select Incremental Resolution
- Click Start
From the API:
POST /api/v1/identity-graphs/{id}/resolve
Content-Type: application/json
{
"mode": "incremental"
}Limitations of Incremental Resolution
- Drift accumulation — Over many incremental runs, small inaccuracies can accumulate. Schedule periodic full runs to correct this.
- Deletes may be missed — If using timestamp-based change detection, deleted records are not detected. Use change tracking or periodic full runs.
- Configuration changes — If you modify merge rules, limit rules, or identifier families, you must run full resolution for the changes to take effect across all records.
Scheduling
Identity resolution can be scheduled to run automatically:
| Schedule | Mode | Use Case |
|---|---|---|
| Manual | Either | Run on demand after configuration changes |
| Hourly | Incremental | Near-real-time identity unification |
| Daily | Incremental | Standard cadence for most use cases |
| Weekly | Full | Periodic full refresh to correct drift |
| Custom Cron | Either | Specific timing needs |
Recommended Schedule Pattern
A common pattern is to combine both modes:
- Daily at 2 AM: Incremental resolution (process new/changed records)
- Weekly on Sunday at midnight: Full resolution (recompute everything to correct drift)
This provides daily freshness with weekly accuracy correction.
Monitoring
Resolution Run Status
Each resolution run has a status:
| Status | Description |
|---|---|
| Queued | The run is scheduled but has not started |
| Running | The resolution is in progress |
| Completed | The resolution finished successfully |
| Failed | The resolution encountered an error |
| Cancelled | The run was manually cancelled |
Run Metrics
After a successful run, the following metrics are available:
| Metric | Description |
|---|---|
| Records processed | Total number of source records included in resolution |
| Clusters formed | Number of distinct identity clusters |
| Merges performed | Number of records linked to a cluster with more than one record |
| Singletons | Number of clusters with only one record (unmatched) |
| Edges created | Total number of edges in the identity graph |
| Edges dropped (limits) | Number of edges dropped due to limit rules |
| Duration | Total wall-clock time for the run |
| Golden records produced | Number of unified profiles created |
Cluster Size Distribution
The run results include a cluster size distribution showing how many clusters contain N records:
| Cluster Size | Count | Percentage |
|---|---|---|
| 1 (singleton) | 450,000 | 45% |
| 2 | 200,000 | 20% |
| 3-5 | 150,000 | 15% |
| 6-10 | 80,000 | 8% |
| 11-50 | 50,000 | 5% |
| 51-100 | 10,000 | 1% |
| 100+ | 0 | 0% (capped by limit rules) |
A healthy distribution typically shows a long tail: many singletons, a significant number of 2-record clusters, and decreasing counts for larger clusters.
If you see many clusters at or near your max cluster size limit, it may indicate over-merging. Review sample clusters at the limit boundary to check for false merges.
Error Handling
When a resolution run fails, SignalSmith provides:
- Error message — The specific error that caused the failure
- Error phase — Which phase failed (extraction, edge building, clustering, golden records, materialization)
- Partial results — Whether any intermediate results were produced before the failure
Common failure causes:
| Cause | Resolution |
|---|---|
| Warehouse timeout | Increase warehouse timeout settings or scale up compute |
| Permission denied | Check source connection credentials |
| Out of memory | Reduce the number of concurrent edges being processed |
| Schema change | A source table was altered. Update identifier family column mappings. |
Next Steps
- Golden Records — Configure survivorship rules for unified profiles
- Profile Explorer — Search and view resolved profiles
- Creating an Identity Graph — Configure or reconfigure your identity graph