Identifier Families

Identifier families are the types of identifiers used to link customer records during identity resolution. Each family represents a category of identifier, and each family can have multiple variants that distinguish subtypes within the family.

Family vs. Variant

An identifier family is a broad category of identifier. A variant is a specific subtype within that family.

Family	Variants	Description
Email	Personal, Work	Email addresses, distinguished by context
Phone	Mobile, Home, Work	Phone numbers, distinguished by type
Device	Mobile Advertising ID, Cookie, IDFV	Digital device identifiers
Customer ID	CRM ID, Loyalty ID, Support ID	Internal system identifiers

Why Variants Matter

Variants serve two purposes:

Merge precision — You can write merge rules that match on specific variants. For example, a rule that merges on “Work Email” matches might be more selective than matching on any email.
Golden record survivorship — When producing golden records, you can set survivorship rules per variant. For example, prefer “Personal Email” over “Work Email” as the canonical email address.

If you don’t need this level of granularity, you can define a family with a single variant (e.g., “Email” with one variant called “Email”).

Common Identifier Families

Email

The most common and generally highest-confidence identifier. Email addresses are globally unique (within a variant like personal vs. work) and rarely shared between people.

Variant	Description	Example
Personal	Consumer email addresses	alice@gmail.com
Work	Business email addresses	alice@company.com
Other	Catch-all for unclassified emails	alice@university.edu

Configuration notes:

Set case sensitive to false — email addresses should be compared case-insensitively
Enable normalization to trim whitespace and lowercase
Consider a minimum length of 5 to filter out garbage values

⚠️

Shared or role-based email addresses (info@company.com, support@company.com) can cause false merges. Consider excluding these with a deny list or using email as part of a multi-identifier merge rule rather than a standalone rule.

Phone

Phone numbers are strong identifiers but can be recycled by carriers, shared within households, or formatted inconsistently.

Variant	Description	Example
Mobile	Cell phone numbers	+1-555-0100
Home	Landline numbers	+1-555-0200
Work	Office phone numbers	+1-555-0300

Configuration notes:

Enable normalization to standardize phone number formats (strip spaces, dashes, parentheses, and add country code)
Consider a minimum length of 7 to filter out short or invalid numbers
Phone numbers can be shared within households — pair with another identifier in merge rules for higher confidence

Device Identifiers

Digital identifiers tied to devices rather than people. These link cross-session and cross-platform activity but can be reset, shared (household devices), or blocked.

Variant	Description	Example
Mobile Advertising ID (MAID)	iOS IDFA or Android GAID	`6D92078A-8246-4BA4-AE5B-76104861E7DC`
Cookie	Browser cookie ID	`sess_abc123def456`
IDFV	iOS Identifier for Vendor	`A1B2C3D4-E5F6-7890-ABCD-EF1234567890`
Fingerprint	Browser or device fingerprint	`fp_9a8b7c6d5e4f`

Configuration notes:

Device identifiers are weaker than email or phone for person-level resolution — a device may be shared by multiple family members
Cookie IDs are ephemeral and can change when cookies are cleared
MAID/IDFA can be reset by the user or limited by OS privacy controls
Consider using device identifiers as part of a multi-identifier merge rule rather than standalone

Customer IDs

Internal identifiers from your systems. These are deterministic (they definitively identify a record within a system) but only useful when the same ID appears across multiple sources.

Variant	Description	Example
CRM ID	Salesforce, HubSpot, etc.	`003000000123ABC`
Loyalty ID	Loyalty or rewards program	`LY-9876-5432`
Support ID	Customer support system	`TICKET-USER-12345`
SSO ID	Single sign-on identifier	`auth0

Configuration notes:

Customer IDs are the strongest identifiers because they are explicitly assigned by a system
Only useful for cross-system resolution when the same ID is used (or mapped) across sources
Set case sensitive appropriately for the system that generates the ID

Custom Identifiers

For domain-specific identifiers that don’t fit the standard categories:

Example	Description
Account Number	Financial services account identifier
Patient ID	Healthcare patient identifier
Student ID	Education student identifier
Member Number	Association or membership identifier

Create a custom family with the appropriate name and variants for your domain.

Column Mapping

Each identifier variant must be mapped to specific columns in your entity type source tables. This tells SignalSmith where to find each identifier type in your data.

Mapping Configuration

For each combination of identifier variant and entity type:

Setting	Description
Column	The column in the source table containing this identifier
Enabled	Whether to use this mapping (toggle on/off without removing)

Example Mapping

For three entity types (Website Users, App Users, CRM Contacts):

Identifier	Website Users	App Users	CRM Contacts
Personal Email	`email`	`user_email`	`contact_email`
Work Email	—	—	`business_email`
Mobile Phone	`phone`	`mobile`	`mobile_phone`
MAID	—	`advertising_id`	—
Cookie	`session_cookie`	—	—
CRM ID	—	—	`contact_id`
Loyalty ID	`loyalty_number`	`loyalty_id`	`loyalty_ref`

A dash (—) means the entity type doesn’t have that identifier. Leave the mapping empty for those cells.

Identifier Normalization

Normalization ensures consistent comparison of identifier values across sources. Different systems may store the same identifier in different formats.

Default Normalization Rules

Family	Normalization Applied
Email	Lowercase, trim whitespace, optionally strip `+` aliases
Phone	Strip non-numeric characters, add country code prefix, trim whitespace
Device	Lowercase (for MAIDs/IDFAs), trim whitespace
Customer ID	Trim whitespace only (case handling depends on the system)

Custom Normalization

For custom identifier families, you can specify:

Trim whitespace — Remove leading and trailing spaces
Lowercase — Convert to lowercase for case-insensitive matching
Strip characters — Remove specific characters (e.g., dashes from phone numbers)
Regex replace — Apply a regex substitution for complex normalization

Best Practices

Start with high-confidence identifiers — Begin with email and customer ID, which are the most reliable. Add weaker identifiers (device, cookie) later.
Use variants when you have the data — If your source tables distinguish personal vs. work email, create separate variants. If not, a single variant is fine.
Enable normalization — Inconsistent formatting is the most common cause of missed matches
Set minimum lengths — Short identifier values (1-3 characters) are often garbage data that can cause false merges
Audit identifier quality — Before running resolution, check identifier columns for null rates, duplicate rates, and obvious data quality issues
Document custom families — When creating custom identifier families, add a clear description so team members understand what the identifier represents

Next Steps

Merge Rules — Define how identifier matches link records
Limit Rules — Prevent over-merging from shared identifiers
Creating an Identity Graph — Full setup wizard

Creating an Identity Graph Merge Rules