CRM Data Cleanup: A 4-Step Recovery Guide for HubSpot and Salesforce
Duplicates everywhere and reports that don't match reality. RevBlack's 4-step guide covers how to clean HubSpot and Salesforce data and keep it clean.
Table of contents
CRM Data Cleanup: How to Recover When Your CRM Becomes the Liar in the Room
Nobody searches for "CRM data cleanup" because things are going well. The CRM that was supposed to be the single source of truth has become the least trusted system in the building. Reports do not match reality. Marketing campaigns hit the wrong people and come back empty. Sales reps are chasing hot leads that are actually dead records with a fresh coat of paint. And in every pipeline meeting, there is that sinking feeling: we cannot trust our own system.
RevBlack has run CRM data cleanup engagements across dozens of HubSpot and Salesforce instances. Bad data does not announce itself - it creeps in. A skipped field here. An integration glitch there. Someone bulk-imported 5,000 contacts "just to get them in the system" and now there are duplicates compounding across every report. This guide is for the teams already in that situation - not prevention theory, but a four-step recovery sequence that stops the bleeding and closes the holes that caused it.
How Do You Know When You Need a Full CRM Data Cleanup?
Not every data problem requires emergency cleanup - knowing the difference between a blip and a crisis determines whether to stop everything or just fix one thing and move on.
It is a blip if: one campaign underperforms or a rep has to dig for a missing phone number. That is Tuesday.
It is a cleanup when:
- Pipeline reports and actual deal flow do not reflect the same reality
- Duplicate records appear in every report, owned by different people
- Sales cannot work leads because key details are missing across most records
- Automations are breaking - lead routing is not firing, scoring is skipping, workflows are behaving unpredictably
One of these issues is annoying. Two means paying attention. Three or more means stopping what you are doing and entering cleanup mode. For how to build the prevention infrastructure that keeps you out of cleanup mode after this engagement ends, see the 6 pillars of CRM data quality guide.
Step 1: How Do You Freeze the Problem Before Cleaning It?
The first move in any CRM data cleanup is not fixing - it is stopping the bad data from continuing to flow in while the cleanup is running.
Freeze the inputs.
In HubSpot: pause imports, workflows, and any integrations that could create or overwrite records during the cleanup window.
In Salesforce: check scheduled jobs, Apex triggers, and connected app syncs. Anything that writes to records needs to be paused or closely monitored.
Back everything up - including the junk.
Export contacts, companies and accounts, deals and opportunities, and any custom objects before touching anything. When a mistake happens mid-cleanup - and it will - the backup is the only path to recovery. Do not skip this step because the data feels too messy to be worth saving.
Map the mess before fixing it.
Do not say "it is all bad" and start randomly merging records. Break the problem down into categories: duplicates, missing fields, bad segmentation, broken routing, attribution gaps. Then rank by GTM impact. A misnamed campaign UTM can wait. A broken lead routing rule that is dropping inbound leads into a dead queue cannot.
For how sync errors and integration issues compound data quality problems during a cleanup, see the HubSpot Salesforce sync errors diagnostic playbook.
Step 2: How Do You Clean HubSpot and Salesforce Data Without Making It Worse?
Randomly fixing records across both systems simultaneously is the fastest way to create new problems. RevBlack cleans one system at a time with defined merge logic before touching the second.
Data cleanup in HubSpot:
Start with Manage Duplicates for contacts and companies. Decide the master record logic before merging anything: oldest record, most complete record, or the record tied to Salesforce. Before merging, transfer related contacts, deals, and attachments so historical context is not lost. Bulk-fill critical fields - lifecycle stage, lead source, owner - using lists and workflows rather than manual record-by-record editing. Then clean campaign history: remove outdated UTMs, fix naming conventions, and align to the current attribution model so future reporting reflects reality.
Data cleanup in Salesforce:
Run Duplicate Record Reports for leads, contacts, and accounts. Merge slowly - check related opportunities, tasks, and custom objects before committing to any merge. After merging, put validation rules in place immediately so bad data cannot re-enter through the same gaps. Lock down picklists. Make key fields mandatory. Nobody gets to skip Lead Source to get a record in faster.
For the full deduplication sequence including Insycle configuration, merge logic, edge cases, and the phased rollout that prevents irreversible mistakes, see the CRM deduplication playbook.
Step 3: How Do You Configure the CRM to Prevent Bad Data From Returning?
Cleaning the existing data solves the backlog. Closing the holes that created the backlog is what prevents the cleanup from needing to happen again in six months.
Turn on deduplication rules in HubSpot, Salesforce, and every connected tool that creates or updates records. A deduplication rule that only exists in HubSpot does nothing when a list import through a connected tool creates duplicates directly in Salesforce.
Make critical fields required. Lifecycle stage, lead source, and record owner should be required fields - not recommended fields that reps fill in when they remember. A record that enters the system without these fields is immediately unreportable and unroutable.
Build alerts for import anomalies. Flag any import that creates more than a defined threshold of duplicates or skips mandatory fields. Catching a bad import before it propagates is significantly easier than cleaning it after the fact.
Schedule recurring data quality reviews. Weekly duplicate reports and stale record reviews need to be on the RevOps calendar - and actually reviewed, not just generated. For how these recurring reviews fit into a broader RevOps governance structure, see the data governance guide.
Step 4: How Do You Reset the Culture Around CRM Data Quality?
Data does not rot on its own - people break it, usually unintentionally. A technical cleanup without a cultural reset produces the same problem within a quarter.
After the cleanup is complete, RevBlack recommends a team meeting that connects the data problem to its business impact - not as a lecture, but as a concrete demonstration. "Here is how duplicate records wasted ad spend in Q2." "Here is how missing fields extended this deal cycle by three weeks." When the connection between data quality and revenue is visible, the behavior that creates bad data changes.
Then assign ownership. Whether it is a dedicated data steward or a rotating RevOps responsibility, someone needs to have data health in their job description with the authority to enforce it. Without a named owner, data quality maintenance is everyone's responsibility in theory and nobody's responsibility in practice.
For teams using HubSpot and Salesforce together, field mapping consistency between both systems is the most common cultural failure point. When reps in Salesforce use different picklist values than the ones mapped from HubSpot, every sync cycle introduces inconsistency. Aligning both teams on a single standard - and enforcing it at the system level - is what makes the cleanup permanent. For how lifecycle stage definitions need to be consistent across both systems to prevent this drift, see the lifecycle stage and lead management guide.




