Identity ResolutionIdentifier Families

Identifier Families

Identifier families are the types of identifiers used to link customer records during identity resolution. Each family represents a category of identifier, and each family can have multiple variants that distinguish subtypes within the family.

Family vs. Variant

An identifier family is a broad category of identifier. A variant is a specific subtype within that family.

FamilyVariantsDescription
EmailPersonal, WorkEmail addresses, distinguished by context
PhoneMobile, Home, WorkPhone numbers, distinguished by type
DeviceMobile Advertising ID, Cookie, IDFVDigital device identifiers
Customer IDCRM ID, Loyalty ID, Support IDInternal system identifiers

Why Variants Matter

Variants serve two purposes:

  1. Merge precision — You can write merge rules that match on specific variants. For example, a rule that merges on “Work Email” matches might be more selective than matching on any email.

  2. Golden record survivorship — When producing golden records, you can set survivorship rules per variant. For example, prefer “Personal Email” over “Work Email” as the canonical email address.

If you don’t need this level of granularity, you can define a family with a single variant (e.g., “Email” with one variant called “Email”).

Common Identifier Families

Email

The most common and generally highest-confidence identifier. Email addresses are globally unique (within a variant like personal vs. work) and rarely shared between people.

VariantDescriptionExample
PersonalConsumer email addressesalice@gmail.com
WorkBusiness email addressesalice@company.com
OtherCatch-all for unclassified emailsalice@university.edu

Configuration notes:

  • Set case sensitive to false — email addresses should be compared case-insensitively
  • Enable normalization to trim whitespace and lowercase
  • Consider a minimum length of 5 to filter out garbage values
⚠️

Shared or role-based email addresses (info@company.com, support@company.com) can cause false merges. Consider excluding these with a deny list or using email as part of a multi-identifier merge rule rather than a standalone rule.

Phone

Phone numbers are strong identifiers but can be recycled by carriers, shared within households, or formatted inconsistently.

VariantDescriptionExample
MobileCell phone numbers+1-555-0100
HomeLandline numbers+1-555-0200
WorkOffice phone numbers+1-555-0300

Configuration notes:

  • Enable normalization to standardize phone number formats (strip spaces, dashes, parentheses, and add country code)
  • Consider a minimum length of 7 to filter out short or invalid numbers
  • Phone numbers can be shared within households — pair with another identifier in merge rules for higher confidence

Device Identifiers

Digital identifiers tied to devices rather than people. These link cross-session and cross-platform activity but can be reset, shared (household devices), or blocked.

VariantDescriptionExample
Mobile Advertising ID (MAID)iOS IDFA or Android GAID6D92078A-8246-4BA4-AE5B-76104861E7DC
CookieBrowser cookie IDsess_abc123def456
IDFViOS Identifier for VendorA1B2C3D4-E5F6-7890-ABCD-EF1234567890
FingerprintBrowser or device fingerprintfp_9a8b7c6d5e4f

Configuration notes:

  • Device identifiers are weaker than email or phone for person-level resolution — a device may be shared by multiple family members
  • Cookie IDs are ephemeral and can change when cookies are cleared
  • MAID/IDFA can be reset by the user or limited by OS privacy controls
  • Consider using device identifiers as part of a multi-identifier merge rule rather than standalone

Customer IDs

Internal identifiers from your systems. These are deterministic (they definitively identify a record within a system) but only useful when the same ID appears across multiple sources.

VariantDescriptionExample
CRM IDSalesforce, HubSpot, etc.003000000123ABC
Loyalty IDLoyalty or rewards programLY-9876-5432
Support IDCustomer support systemTICKET-USER-12345
SSO IDSingle sign-on identifier`auth0

Configuration notes:

  • Customer IDs are the strongest identifiers because they are explicitly assigned by a system
  • Only useful for cross-system resolution when the same ID is used (or mapped) across sources
  • Set case sensitive appropriately for the system that generates the ID

Custom Identifiers

For domain-specific identifiers that don’t fit the standard categories:

ExampleDescription
Account NumberFinancial services account identifier
Patient IDHealthcare patient identifier
Student IDEducation student identifier
Member NumberAssociation or membership identifier

Create a custom family with the appropriate name and variants for your domain.

Column Mapping

Each identifier variant must be mapped to specific columns in your entity type source tables. This tells SignalSmith where to find each identifier type in your data.

Mapping Configuration

For each combination of identifier variant and entity type:

SettingDescription
ColumnThe column in the source table containing this identifier
EnabledWhether to use this mapping (toggle on/off without removing)

Example Mapping

For three entity types (Website Users, App Users, CRM Contacts):

IdentifierWebsite UsersApp UsersCRM Contacts
Personal Emailemailuser_emailcontact_email
Work Emailbusiness_email
Mobile Phonephonemobilemobile_phone
MAIDadvertising_id
Cookiesession_cookie
CRM IDcontact_id
Loyalty IDloyalty_numberloyalty_idloyalty_ref

A dash (—) means the entity type doesn’t have that identifier. Leave the mapping empty for those cells.

Identifier Normalization

Normalization ensures consistent comparison of identifier values across sources. Different systems may store the same identifier in different formats.

Default Normalization Rules

FamilyNormalization Applied
EmailLowercase, trim whitespace, optionally strip + aliases
PhoneStrip non-numeric characters, add country code prefix, trim whitespace
DeviceLowercase (for MAIDs/IDFAs), trim whitespace
Customer IDTrim whitespace only (case handling depends on the system)

Custom Normalization

For custom identifier families, you can specify:

  • Trim whitespace — Remove leading and trailing spaces
  • Lowercase — Convert to lowercase for case-insensitive matching
  • Strip characters — Remove specific characters (e.g., dashes from phone numbers)
  • Regex replace — Apply a regex substitution for complex normalization

Best Practices

  • Start with high-confidence identifiers — Begin with email and customer ID, which are the most reliable. Add weaker identifiers (device, cookie) later.
  • Use variants when you have the data — If your source tables distinguish personal vs. work email, create separate variants. If not, a single variant is fine.
  • Enable normalization — Inconsistent formatting is the most common cause of missed matches
  • Set minimum lengths — Short identifier values (1-3 characters) are often garbage data that can cause false merges
  • Audit identifier quality — Before running resolution, check identifier columns for null rates, duplicate rates, and obvious data quality issues
  • Document custom families — When creating custom identifier families, add a clear description so team members understand what the identifier represents

Next Steps