Skip to main content
Technical Systems

Master Data Management Strategy: Where Golden Records Break

Why single source of truth becomes multiple sources of confusion

Why master data management strategies fail under scale, how golden records diverge, and what happens when merge logic cannot keep up.

Master Data Management Strategy: Where Golden Records Break

Master data management strategy sounds reasonable. Maintain one authoritative source for customer data, product catalogs, or organizational hierarchies. All systems read from this golden record. Updates flow through the MDM system to ensure consistency.

This works in architecture diagrams. Production tells a different story.

Systems that were supposed to use the MDM system still maintain local copies because they cannot tolerate the latency. The golden record exists, but so do dozens of divergent copies. Changes propagate at different speeds. Conflicts emerge that the merge logic was not designed to handle.

The strategy assumes the MDM system can win every fight. It cannot.

The Golden Record Problem

A golden record is supposed to be the single source of truth. Customer data exists in the CRM, billing system, support ticketing, and analytics warehouse. The MDM system merges these into one authoritative record.

The merge logic must resolve conflicts. The CRM says the customer email is old@example.com. The billing system has new@example.com. Which one wins?

Most MDM strategies use precedence rules. Billing data takes priority because payment requires accurate contact information. This works until the CRM receives a direct customer update after the billing system updated but before the MDM sync ran.

Now you have:

  • CRM: new@example.com (customer update at 10:02)
  • Billing: new@example.com (system update at 10:00)
  • MDM: old@example.com (last sync at 9:55)

The next MDM sync sees the CRM update at 10:02 and billing update at 10:00. Billing has precedence, so it takes new@example.com from billing and overwrites the CRM with the same value. Everything looks consistent.

Then a support ticket arrives via urgent@example.com, which does not exist in any system. The customer cannot be identified. The ticket sits unrouted.

The golden record was consistent but wrong. The actual current email was never in any source system.

When Sync Delays Exceed Business Tolerance

MDM systems sync on intervals. Every 5 minutes, every hour, nightly. The interval determines how stale the golden record can be.

Real-time sync sounds better but scales poorly. Every source system update triggers an MDM update, which triggers writes back to other source systems. A single customer change can generate dozens of database writes.

Under load, the sync queue grows. The 5-minute interval becomes 20 minutes, then an hour. Systems that depend on current data start querying source systems directly instead of the MDM system.

Now you have two data paths:

  • The MDM path with merge logic and governance but high latency
  • The direct path with low latency but no consistency guarantees

A customer updates their address in the CRM. The support agent queries the MDM system and sees the old address. They update it manually in the ticketing system. The MDM sync runs and overwrites the ticketing system with the CRM data. Now the ticketing system has the new address from the CRM but lost the agent’s notes that were attached to their manual update.

Data was not lost. Context was lost. The merge preserved records but destroyed meaning.

Match and Merge Logic at Scale

MDM systems must identify when different records refer to the same entity. This is the match problem. Then they must combine those records into one golden record. This is the merge problem.

Matching seems straightforward. Compare email addresses, phone numbers, customer IDs. If two records share an email, they are the same customer.

This breaks immediately:

def match_customers(record_a, record_b):
    """Naive matching logic."""
    if record_a['email'] == record_b['email']:
        return True
    if record_a['phone'] == record_b['phone']:
        return True
    return False

# Breaks on:
# - Shared family email addresses
# - Shared business phone numbers
# - Typos that changed one character
# - Format differences (+1-555-1234 vs 5551234)

You add fuzzy matching. Now records that are 80% similar get matched. This catches typos but creates new problems. Different customers with similar names get merged incorrectly. The golden record represents two people, not one.

Merge logic must decide which fields from which records to keep:

def merge_records(records, precedence_order):
    """Merge multiple records into golden record."""
    golden = {}

    for field in ['email', 'phone', 'address', 'name']:
        for source in precedence_order:
            source_records = [r for r in records if r['source'] == source]
            if source_records and source_records[0].get(field):
                golden[field] = source_records[0][field]
                break

    return golden

# This loses minority values
# If 4 systems have old address, 1 has new address,
# and old address has higher precedence, the new address is lost

The merge creates a synthetic record that never existed in any source system. Queries against the golden record return data combinations that would fail validation in the original systems.

When Source Systems Refuse to Sync Back

The MDM system creates a golden record. Now it must propagate this back to source systems so they stay consistent.

Source systems often reject these updates.

The billing system has a foreign key constraint requiring every customer to have a payment method on file. The MDM golden record does not include payment methods because those are considered transactional data, not master data. The sync fails.

The CRM has custom fields that the MDM system does not track. When the MDM writes back to the CRM, it only updates the fields it manages. The CRM sees this as a partial update and rejects it for violating the application’s update policy.

You can make the MDM system aware of every source system’s validation rules and custom fields. Now the MDM schema includes the union of all source system schemas. It has billing-specific fields that only matter to billing and CRM fields that only matter to the CRM.

The golden record is no longer a simplified authoritative view. It is a denormalized union of every system’s data model. Changes to any source system’s schema require MDM schema changes.

Duplicate Detection After the Fact

MDM strategies assume duplicates are caught before they enter the system. They are not.

Two sales reps create customer records for the same company within minutes of each other. The MDM system has not synced yet. Both records enter source systems. Both get synced to MDM. Now the MDM system must detect that these are duplicates and merge them retroactively.

This requires:

  • Identifying which existing golden record each new record should match
  • Merging the golden records
  • Propagating the merge back to source systems
  • Updating all references to the old golden record IDs

The last step breaks most systems. The billing system has invoices referencing customer ID 1234. The MDM system determines 1234 is a duplicate of 5678 and merges them. Now the billing system should update all invoices from customer 1234 to reference customer 5678.

Most billing systems do not support retroactive customer ID changes. The foreign key is immutable after invoice creation. The MDM system cannot propagate the merge.

You can maintain a mapping table in the MDM system that says 1234 and 5678 are the same customer. Queries must join against this mapping to resolve the true golden record ID. This works until the chain grows: 1234 maps to 5678, which later maps to 9012, which maps to 3456.

Now a single customer lookup requires recursive joins through a mapping table that grows without bound.

Data Lineage When the Golden Record Lies

The golden record says the customer’s address is “123 Main St”. The support agent needs to know where this came from. Was it the CRM? Billing? A manual update?

Data lineage tracking in MDM systems records which source contributed which field. In practice, this breaks down quickly.

The address came from the CRM, which got it from a web form, which got it from the customer. But the customer typo’d their zip code. The billing system corrected the zip code based on address verification. The MDM merge took the street address from the CRM and the zip code from billing.

The golden record lineage says the address came from CRM and billing. It does not say which parts came from which system. The support agent cannot tell if the zip code is verified or user-entered.

You can track field-level lineage. Now the lineage table stores one row per field per record. The customer record has 30 fields. There are 10 million customers. The lineage table has 300 million rows. Queries against lineage become slower than queries against the actual data.

Most MDM systems compromise by tracking lineage at the record level, not the field level. The lineage says the customer record came from multiple sources but not which fields came from where. This is accurate but not useful.

When Business Rules Conflict Across Domains

MDM systems enforce business rules. Customer emails must be unique. Products must have SKUs. Organizations must have tax IDs.

These rules make sense within a domain. They break across domains.

The rule says customer emails must be unique. A customer uses shared@family.com for personal purchases and business purchases. The MDM system rejects the business customer creation because the email already exists.

The sales team escalates. They need a business customer record with this email. The MDM admin adds an exception: emails can be duplicated if the customer type differs.

Now you have:

  • One personal customer with shared@family.com
  • One business customer with shared@family.com
  • Queries that filter by email return two customers

The marketing system sends two emails to the same address. The customer receives duplicate communications and complains. The marketing team adds logic to deduplicate by email before sending. This works until a different customer legitimately uses the same email as their spouse.

The business rule was correct for the narrow case. It failed when applied globally.

MDM Strategies That Survive Production

Effective master data management strategy assumes the golden record will diverge from source systems and plans for it.

Track divergence instead of preventing it:

def measure_mdm_consistency():
    """Compare MDM golden record vs source systems."""
    inconsistencies = []

    for customer_id in get_all_customer_ids():
        golden = mdm.get_customer(customer_id)
        crm_record = crm.get_customer(customer_id)
        billing_record = billing.get_customer(customer_id)

        for field in ['email', 'phone', 'address']:
            values = {
                'mdm': golden.get(field),
                'crm': crm_record.get(field),
                'billing': billing_record.get(field)
            }

            unique_values = set(v for v in values.values() if v)

            if len(unique_values) > 1:
                inconsistencies.append({
                    'customer_id': customer_id,
                    'field': field,
                    'values': values
                })

    return inconsistencies

When the MDM system and source systems disagree, you need visibility into which fields differ and by how much. This makes divergence measurable rather than invisible.

Accept that some source systems will never fully adopt the MDM system. Build read-through caching that queries the MDM system first and falls back to source systems:

class CustomerDataAccess:
    def get_customer(self, customer_id):
        """Try MDM first, fall back to source systems."""
        try:
            customer = self.mdm_client.get_customer(customer_id)
            if self._is_fresh(customer):
                return customer
        except MDMUnavailable:
            pass

        # MDM failed or data too stale, query sources directly
        return self._build_customer_from_sources(customer_id)

    def _is_fresh(self, customer):
        """Check if MDM data is recent enough."""
        last_update = customer.get('_mdm_sync_time')
        if not last_update:
            return False

        age_seconds = time.time() - last_update
        return age_seconds < 300  # 5 minute threshold

This does not enforce consistency. It provides availability when the MDM system cannot meet latency requirements.

Make merge conflicts explicit rather than hiding them in precedence rules:

class ConflictAwareGoldenRecord:
    def __init__(self, customer_id):
        self.customer_id = customer_id
        self.fields = {}
        self.conflicts = {}

    def add_field(self, field_name, value, source, timestamp):
        """Track all values from all sources."""
        if field_name not in self.fields:
            self.fields[field_name] = []

        self.fields[field_name].append({
            'value': value,
            'source': source,
            'timestamp': timestamp
        })

    def get_field(self, field_name):
        """Return most recent value, but expose conflicts."""
        values = self.fields.get(field_name, [])
        if not values:
            return None

        # Sort by timestamp, most recent first
        sorted_values = sorted(values, key=lambda x: x['timestamp'], reverse=True)

        # Check if multiple recent values exist
        most_recent = sorted_values[0]
        recent_window = most_recent['timestamp'] - 3600  # 1 hour window

        recent_values = [
            v for v in sorted_values
            if v['timestamp'] >= recent_window
        ]

        if len(recent_values) > 1:
            unique_values = set(v['value'] for v in recent_values)
            if len(unique_values) > 1:
                self.conflicts[field_name] = recent_values

        return most_recent['value']

This exposes when the merge logic is making arbitrary choices between equally valid recent values. Applications can handle conflicts explicitly rather than receiving silently chosen data.

The Limits of Master Data Management

Master data management strategy fails when it assumes perfect synchronization is achievable. Systems designed for perfect synchronization break under partial synchronization.

Real MDM deployments have source systems that refuse to sync, golden records that diverge, and merge logic that cannot handle the conflict patterns that actually occur.

Effective strategies acknowledge this. They optimize for visibility into divergence, graceful degradation when sync fails, and explicit handling of merge conflicts.

This does not mean abandoning the golden record concept. It means building systems that continue functioning when the golden record is stale, incomplete, or wrong, which happens constantly in production.

The goal is not perfect consistency. The goal is knowing when consistency breaks and having systems that tolerate it.