Golden Records
Golden records are the unified customer profiles produced by identity resolution. For each cluster of linked records, SignalSmith creates one golden record that contains the “best” value for each attribute, determined by configurable survivorship strategies.
What Is a Golden Record?
When identity resolution groups multiple source records into a cluster, those records often contain conflicting attribute values — and those records may come from different data models. For example:
| Record | Model | Name | City | |
|---|---|---|---|---|
| A | Website Events | Alice Smith | alice@gmail.com | New York |
| B | CRM Contacts | Alice M. Smith | alice@company.com | New York |
| C | App Users | alice_s | alice@gmail.com | San Francisco |
All three records represent the same person, but they come from different models and disagree on name and city. The golden record resolves these conflicts using survivorship strategies to produce a single, definitive profile:
| Field | Golden Record Value | Strategy Used |
|---|---|---|
| Name | Alice M. Smith | Source priority (CRM ranked highest) |
| alice@gmail.com | Most recent | |
| City | San Francisco | Most recent |
Multi-Model Architecture
Golden records in SignalSmith support multi-model configurations. A single golden record config is attached to an identity graph and can draw attributes from columns across any of the models participating in that graph.
Each golden record attribute defines:
- Attribute name — the output column name in the golden record table
- Survivorship strategy — how conflicting values are resolved
- Sources — one or more model-column mappings that feed this attribute
This means you can combine columns from different models into a single unified profile. For example, you might pull email from your CRM model, last_login from your App model, and lifetime_value from your Transactions model — all into one golden record.
Each identity graph can have at most one golden record configuration.
Survivorship Strategies
Survivorship strategies determine how conflicting values are resolved for each attribute. Every attribute must have an explicitly assigned strategy.
Most Recent
Uses the value from the record with the most recent timestamp. This assumes that newer data is more accurate than older data.
Each source model can specify a timestamp column used for recency ranking. If a model has no timestamp column, its records are ranked after those with timestamps.
Best for: Attributes that change over time, like address, city, phone number, or subscription status.
Example:
| Record | Model | Updated At | City |
|---|---|---|---|
| A | CRM | 2024-01-15 | New York |
| B | App | 2024-06-20 | San Francisco |
| C | Website | 2024-03-10 | New York |
Result: San Francisco (Record B has the most recent timestamp)
Most Frequent
Uses the value that appears most often across all source records in the cluster, regardless of which model they came from. This applies a “majority vote” approach.
Best for: Attributes where the most common value is likely correct, like gender, country, or language preference.
Example:
| Record | Model | Country |
|---|---|---|
| A | CRM | US |
| B | App | US |
| C | Website | UK |
Result: US (appears 2 out of 3 times)
In case of a tie, the most recent value among the tied values is used.
Source Priority
Uses the value from the source model that you designate as the most authoritative. You assign a numeric priority to each source (lower number = higher priority), and the value from the highest-priority source that has a non-null value is used.
Configuration:
- Priority — Each source mapping includes a priority number. Lower numbers indicate higher priority.
Best for: Attributes where certain systems are known to be more reliable. For example, a CRM might have more accurate customer names than a website registration form.
Example:
Source priorities: CRM (priority 1) > App (priority 2) > Website (priority 3)
| Record | Model | Priority | Name |
|---|---|---|---|
| A | Website | 3 | Alice Smith |
| B | CRM | 1 | Alice M. Smith |
| C | App | 2 | alice_s |
Result: Alice M. Smith (CRM has the highest priority and a non-null value)
When using source_priority, each source must have a distinct priority value.
First Non-Null
Uses the first non-null value encountered across all source records. Simple and deterministic.
Best for: Immutable attributes that should be captured once, like original signup date or first referral source.
Collect All
Aggregates all distinct non-null values across the cluster into a single comma-separated string. This is useful when you want to preserve all values rather than picking a winner.
Best for: Tags, categories, or multi-valued attributes where all values are meaningful. For example, collecting all product categories a customer has interacted with.
Example:
| Record | Model | Interest |
|---|---|---|
| A | CRM | Sports |
| B | App | Music |
| C | Website | Sports |
Result: Music,Sports (distinct values, comma-separated)
Min
Selects the minimum value across all source records.
Best for: Attributes where the earliest or smallest value is desired, like first_seen_at, created_at, or min_purchase_amount.
Max
Selects the maximum value across all source records.
Best for: Attributes where the latest or largest value is desired, like last_seen_at, lifetime_value, or max_order_value.
Configuring Golden Records
Creating a Configuration
- Navigate to Identity Resolution and select your identity graph
- Click Golden Record Settings
- Define your output attributes — for each attribute:
- Enter an attribute name (the output column name, must be a valid identifier)
- Select a survivorship strategy
- Add one or more sources — each source maps a model and column to this attribute
- Set source priorities if using the source priority strategy
- Click Save
Attribute Sources
Each attribute requires at least one source. A source maps a specific column from a specific model to the golden record attribute:
Attribute: "email"
Strategy: most_recent
Sources:
- Model: CRM Contacts → Column: email_address
- Model: App Users → Column: user_email
- Model: Website → Column: contact_emailThis tells SignalSmith: “To produce the golden record email column, look at email_address from CRM, user_email from App, and contact_email from Website, then pick the most recent one.”
Example Configuration
| Attribute | Strategy | Sources |
|---|---|---|
email | Most Recent | CRM → email_address, App → user_email |
name | Source Priority | CRM → full_name (priority 1), App → display_name (priority 2) |
country | Most Frequent | CRM → country, Website → geo_country |
created_at | Min | CRM → created_at, App → signup_date |
lifetime_value | Max | Transactions → total_ltv |
referral_source | First Non-Null | App → referral_code, Website → utm_source |
interests | Collect All | App → interest_category, Website → content_category |
Golden Record Schema
The golden record is materialized as a _GOLDEN_RECORD table in your warehouse schema. It contains:
| Column | Description |
|---|---|
ss_id | The cluster identifier from the identity graph — uniquely identifies the resolved entity |
| Attribute columns | One column per configured attribute, populated by the assigned survivorship strategy |
_winning_model_id | The model ID of the record with the most recent timestamp in the cluster |
_winning_pk | The primary key of the most recent record (from the winning model) |
The golden record table is created using CREATE OR REPLACE TABLE, so it is fully rebuilt on each identity resolution run. You can query the _GOLDEN_RECORD table directly from your BI tools or data pipelines.
Golden Records and the Platform
Golden records integrate with other SignalSmith features:
Traits
Traits can be computed on golden records. When you select the resolved entity type as the basis for a trait, the trait query joins against the golden record table rather than individual source tables.
Audiences
Audiences can segment golden records. Filter conditions reference the golden record attributes and traits computed on golden records. This means your audiences operate on unified profiles, not fragmented source records.
When a golden record exists for an identity graph, audience compilation queries the _GOLDEN_RECORD table directly — the data is already deduplicated, so no additional deduplication logic is needed.
Audience Syncs
Audience syncs can send golden record identifiers and attributes to destinations. For example, you might sync the golden record email (the “best” email from the survivorship strategy) to an ad platform.
Viewing Golden Records
You can view individual golden records in the Profile Explorer. Each golden record page shows:
- The unified attribute values and which source/strategy produced them
- All source records in the cluster
- The identifiers that linked the records together
- The merge history (which edges connected which records)
Updating Golden Records
Golden records are automatically recomputed when identity resolution runs:
- Full resolution — All golden records are recomputed from scratch
- Incremental resolution — Only golden records in clusters that changed are recomputed
If you change golden record settings (add/remove attributes, change strategies, or modify sources), you must run full resolution for the new settings to apply to all golden records.
Next Steps
- Profile Explorer — Search and view golden records
- Running Resolution — Execute resolution to produce golden records
- Traits — Compute attributes on golden records