Identifier Families
Identifier families are the types of identifiers used to link customer records during identity resolution. Each family represents a category of identifier, and each family can have multiple variants that distinguish subtypes within the family.
Family vs. Variant
An identifier family is a broad category of identifier. A variant is a specific subtype within that family.
| Family | Variants | Description |
|---|---|---|
| Personal, Work | Email addresses, distinguished by context | |
| Phone | Mobile, Home, Work | Phone numbers, distinguished by type |
| Device | Mobile Advertising ID, Cookie, IDFV | Digital device identifiers |
| Customer ID | CRM ID, Loyalty ID, Support ID | Internal system identifiers |
Why Variants Matter
Variants serve two purposes:
-
Merge precision — You can write merge rules that match on specific variants. For example, a rule that merges on “Work Email” matches might be more selective than matching on any email.
-
Golden record survivorship — When producing golden records, you can set survivorship rules per variant. For example, prefer “Personal Email” over “Work Email” as the canonical email address.
If you don’t need this level of granularity, you can define a family with a single variant (e.g., “Email” with one variant called “Email”).
Common Identifier Families
The most common and generally highest-confidence identifier. Email addresses are globally unique (within a variant like personal vs. work) and rarely shared between people.
| Variant | Description | Example |
|---|---|---|
| Personal | Consumer email addresses | alice@gmail.com |
| Work | Business email addresses | alice@company.com |
| Other | Catch-all for unclassified emails | alice@university.edu |
Configuration notes:
- Set case sensitive to
false— email addresses should be compared case-insensitively - Enable normalization to trim whitespace and lowercase
- Consider a minimum length of 5 to filter out garbage values
Shared or role-based email addresses (info@company.com, support@company.com) can cause false merges. Consider excluding these with a deny list or using email as part of a multi-identifier merge rule rather than a standalone rule.
Phone
Phone numbers are strong identifiers but can be recycled by carriers, shared within households, or formatted inconsistently.
| Variant | Description | Example |
|---|---|---|
| Mobile | Cell phone numbers | +1-555-0100 |
| Home | Landline numbers | +1-555-0200 |
| Work | Office phone numbers | +1-555-0300 |
Configuration notes:
- Enable normalization to standardize phone number formats (strip spaces, dashes, parentheses, and add country code)
- Consider a minimum length of 7 to filter out short or invalid numbers
- Phone numbers can be shared within households — pair with another identifier in merge rules for higher confidence
Device Identifiers
Digital identifiers tied to devices rather than people. These link cross-session and cross-platform activity but can be reset, shared (household devices), or blocked.
| Variant | Description | Example |
|---|---|---|
| Mobile Advertising ID (MAID) | iOS IDFA or Android GAID | 6D92078A-8246-4BA4-AE5B-76104861E7DC |
| Cookie | Browser cookie ID | sess_abc123def456 |
| IDFV | iOS Identifier for Vendor | A1B2C3D4-E5F6-7890-ABCD-EF1234567890 |
| Fingerprint | Browser or device fingerprint | fp_9a8b7c6d5e4f |
Configuration notes:
- Device identifiers are weaker than email or phone for person-level resolution — a device may be shared by multiple family members
- Cookie IDs are ephemeral and can change when cookies are cleared
- MAID/IDFA can be reset by the user or limited by OS privacy controls
- Consider using device identifiers as part of a multi-identifier merge rule rather than standalone
Customer IDs
Internal identifiers from your systems. These are deterministic (they definitively identify a record within a system) but only useful when the same ID appears across multiple sources.
| Variant | Description | Example |
|---|---|---|
| CRM ID | Salesforce, HubSpot, etc. | 003000000123ABC |
| Loyalty ID | Loyalty or rewards program | LY-9876-5432 |
| Support ID | Customer support system | TICKET-USER-12345 |
| SSO ID | Single sign-on identifier | `auth0 |
Configuration notes:
- Customer IDs are the strongest identifiers because they are explicitly assigned by a system
- Only useful for cross-system resolution when the same ID is used (or mapped) across sources
- Set case sensitive appropriately for the system that generates the ID
Custom Identifiers
For domain-specific identifiers that don’t fit the standard categories:
| Example | Description |
|---|---|
| Account Number | Financial services account identifier |
| Patient ID | Healthcare patient identifier |
| Student ID | Education student identifier |
| Member Number | Association or membership identifier |
Create a custom family with the appropriate name and variants for your domain.
Column Mapping
Each identifier variant must be mapped to specific columns in your entity type source tables. This tells SignalSmith where to find each identifier type in your data.
Mapping Configuration
For each combination of identifier variant and entity type:
| Setting | Description |
|---|---|
| Column | The column in the source table containing this identifier |
| Enabled | Whether to use this mapping (toggle on/off without removing) |
Example Mapping
For three entity types (Website Users, App Users, CRM Contacts):
| Identifier | Website Users | App Users | CRM Contacts |
|---|---|---|---|
| Personal Email | email | user_email | contact_email |
| Work Email | — | — | business_email |
| Mobile Phone | phone | mobile | mobile_phone |
| MAID | — | advertising_id | — |
| Cookie | session_cookie | — | — |
| CRM ID | — | — | contact_id |
| Loyalty ID | loyalty_number | loyalty_id | loyalty_ref |
A dash (—) means the entity type doesn’t have that identifier. Leave the mapping empty for those cells.
Identifier Normalization
Normalization ensures consistent comparison of identifier values across sources. Different systems may store the same identifier in different formats.
Default Normalization Rules
| Family | Normalization Applied |
|---|---|
Lowercase, trim whitespace, optionally strip + aliases | |
| Phone | Strip non-numeric characters, add country code prefix, trim whitespace |
| Device | Lowercase (for MAIDs/IDFAs), trim whitespace |
| Customer ID | Trim whitespace only (case handling depends on the system) |
Custom Normalization
For custom identifier families, you can specify:
- Trim whitespace — Remove leading and trailing spaces
- Lowercase — Convert to lowercase for case-insensitive matching
- Strip characters — Remove specific characters (e.g., dashes from phone numbers)
- Regex replace — Apply a regex substitution for complex normalization
Best Practices
- Start with high-confidence identifiers — Begin with email and customer ID, which are the most reliable. Add weaker identifiers (device, cookie) later.
- Use variants when you have the data — If your source tables distinguish personal vs. work email, create separate variants. If not, a single variant is fine.
- Enable normalization — Inconsistent formatting is the most common cause of missed matches
- Set minimum lengths — Short identifier values (1-3 characters) are often garbage data that can cause false merges
- Audit identifier quality — Before running resolution, check identifier columns for null rates, duplicate rates, and obvious data quality issues
- Document custom families — When creating custom identifier families, add a clear description so team members understand what the identifier represents
Next Steps
- Merge Rules — Define how identifier matches link records
- Limit Rules — Prevent over-merging from shared identifiers
- Creating an Identity Graph — Full setup wizard