Golden Records

Golden records are the unified customer profiles produced by identity resolution. For each cluster of linked records, SignalSmith creates one golden record that contains the “best” value for each attribute, determined by configurable survivorship strategies.

What Is a Golden Record?

When identity resolution groups multiple source records into a cluster, those records often contain conflicting attribute values — and those records may come from different data models. For example:

Record	Model	Name	Email	City
A	Website Events	Alice Smith	alice@gmail.com	New York
B	CRM Contacts	Alice M. Smith	alice@company.com	New York
C	App Users	alice_s	alice@gmail.com	San Francisco

All three records represent the same person, but they come from different models and disagree on name and city. The golden record resolves these conflicts using survivorship strategies to produce a single, definitive profile:

Field	Golden Record Value	Strategy Used
Name	Alice M. Smith	Source priority (CRM ranked highest)
Email	alice@gmail.com	Most recent
City	San Francisco	Most recent

Multi-Model Architecture

Golden records in SignalSmith support multi-model configurations. A single golden record config is attached to an identity graph and can draw attributes from columns across any of the models participating in that graph.

Each golden record attribute defines:

Attribute name — the output column name in the golden record table
Survivorship strategy — how conflicting values are resolved
Sources — one or more model-column mappings that feed this attribute

This means you can combine columns from different models into a single unified profile. For example, you might pull email from your CRM model, last_login from your App model, and lifetime_value from your Transactions model — all into one golden record.

Each identity graph can have at most one golden record configuration.

Survivorship Strategies

Survivorship strategies determine how conflicting values are resolved for each attribute. Every attribute must have an explicitly assigned strategy.

Most Recent

Uses the value from the record with the most recent timestamp. This assumes that newer data is more accurate than older data.

Each source model can specify a timestamp column used for recency ranking. If a model has no timestamp column, its records are ranked after those with timestamps.

Best for: Attributes that change over time, like address, city, phone number, or subscription status.

Example:

Record	Model	Updated At	City
A	CRM	2024-01-15	New York
B	App	2024-06-20	San Francisco
C	Website	2024-03-10	New York

Result: San Francisco (Record B has the most recent timestamp)

Most Frequent

Uses the value that appears most often across all source records in the cluster, regardless of which model they came from. This applies a “majority vote” approach.

Best for: Attributes where the most common value is likely correct, like gender, country, or language preference.

Example:

Record	Model	Country
A	CRM	US
B	App	US
C	Website	UK

Result: US (appears 2 out of 3 times)

In case of a tie, the most recent value among the tied values is used.

Source Priority

Uses the value from the source model that you designate as the most authoritative. You assign a numeric priority to each source (lower number = higher priority), and the value from the highest-priority source that has a non-null value is used.

Configuration:

Priority — Each source mapping includes a priority number. Lower numbers indicate higher priority.

Best for: Attributes where certain systems are known to be more reliable. For example, a CRM might have more accurate customer names than a website registration form.

Example:

Source priorities: CRM (priority 1) > App (priority 2) > Website (priority 3)

Record	Model	Priority	Name
A	Website	3	Alice Smith
B	CRM	1	Alice M. Smith
C	App	2	alice_s

Result: Alice M. Smith (CRM has the highest priority and a non-null value)

⚠️

When using source_priority, each source must have a distinct priority value.

First Non-Null

Uses the first non-null value encountered across all source records. Simple and deterministic.

Best for: Immutable attributes that should be captured once, like original signup date or first referral source.

Collect All

Aggregates all distinct non-null values across the cluster into a single comma-separated string. This is useful when you want to preserve all values rather than picking a winner.

Best for: Tags, categories, or multi-valued attributes where all values are meaningful. For example, collecting all product categories a customer has interacted with.

Example:

Record	Model	Interest
A	CRM	Sports
B	App	Music
C	Website	Sports

Result: Music,Sports (distinct values, comma-separated)

Min

Selects the minimum value across all source records.

Best for: Attributes where the earliest or smallest value is desired, like first_seen_at, created_at, or min_purchase_amount.

Max

Selects the maximum value across all source records.

Best for: Attributes where the latest or largest value is desired, like last_seen_at, lifetime_value, or max_order_value.

Configuring Golden Records

Creating a Configuration

Navigate to Identity Resolution and select your identity graph
Click Golden Record Settings
Define your output attributes — for each attribute:
- Enter an attribute name (the output column name, must be a valid identifier)
- Select a survivorship strategy
- Add one or more sources — each source maps a model and column to this attribute
- Set source priorities if using the source priority strategy
Click Save

Attribute Sources

Each attribute requires at least one source. A source maps a specific column from a specific model to the golden record attribute:

Attribute: "email"
  Strategy: most_recent
  Sources:
    - Model: CRM Contacts → Column: email_address
    - Model: App Users    → Column: user_email
    - Model: Website      → Column: contact_email

This tells SignalSmith: “To produce the golden record email column, look at email_address from CRM, user_email from App, and contact_email from Website, then pick the most recent one.”

Example Configuration

Attribute	Strategy	Sources
`email`	Most Recent	CRM → `email_address`, App → `user_email`
`name`	Source Priority	CRM → `full_name` (priority 1), App → `display_name` (priority 2)
`country`	Most Frequent	CRM → `country`, Website → `geo_country`
`created_at`	Min	CRM → `created_at`, App → `signup_date`
`lifetime_value`	Max	Transactions → `total_ltv`
`referral_source`	First Non-Null	App → `referral_code`, Website → `utm_source`
`interests`	Collect All	App → `interest_category`, Website → `content_category`

Golden Record Schema

The golden record is materialized as a _GOLDEN_RECORD table in your warehouse schema. It contains:

Column	Description
`ss_id`	The cluster identifier from the identity graph — uniquely identifies the resolved entity
Attribute columns	One column per configured attribute, populated by the assigned survivorship strategy
`_winning_model_id`	The model ID of the record with the most recent timestamp in the cluster
`_winning_pk`	The primary key of the most recent record (from the winning model)

The golden record table is created using CREATE OR REPLACE TABLE, so it is fully rebuilt on each identity resolution run. You can query the _GOLDEN_RECORD table directly from your BI tools or data pipelines.

Golden Records and the Platform

Golden records integrate with other SignalSmith features:

Traits

Traits can be computed on golden records. When you select the resolved entity type as the basis for a trait, the trait query joins against the golden record table rather than individual source tables.

Audiences

Audiences can segment golden records. Filter conditions reference the golden record attributes and traits computed on golden records. This means your audiences operate on unified profiles, not fragmented source records.

When a golden record exists for an identity graph, audience compilation queries the _GOLDEN_RECORD table directly — the data is already deduplicated, so no additional deduplication logic is needed.

Audience Syncs

Audience syncs can send golden record identifiers and attributes to destinations. For example, you might sync the golden record email (the “best” email from the survivorship strategy) to an ad platform.

Viewing Golden Records

You can view individual golden records in the Profile Explorer. Each golden record page shows:

The unified attribute values and which source/strategy produced them
All source records in the cluster
The identifiers that linked the records together
The merge history (which edges connected which records)

Updating Golden Records

Golden records are automatically recomputed when identity resolution runs:

Full resolution — All golden records are recomputed from scratch
Incremental resolution — Only golden records in clusters that changed are recomputed

If you change golden record settings (add/remove attributes, change strategies, or modify sources), you must run full resolution for the new settings to apply to all golden records.

Next Steps

Profile Explorer — Search and view golden records
Running Resolution — Execute resolution to produce golden records
Traits — Compute attributes on golden records

Running Resolution Profile Explorer