Audience
- This playbook is designed for beginner CRM administrators.
Playbook objective
The objectives of this playbook are to:
- Determine what your company defines as a duplicate
- Identify duplicate records
- Deduplicate your data
- Read our data management strategy playbook for help with creating a data management strategy.
- The DemandTools modules referenced in this playbook are for versions 5.X.X. Please be sure to update your software.
- We recommend setting up a sandbox environment to test data manipulations prior to implementing them in a production environment. You don’t want to make changes that adversely affect your CRM data quality.
- If you are trying to solve a specific data problem, review the product training documentation in the Validity Help center or get answers to your questions from Validity’s data experts during office hours.
- For technical issues regarding your software, please contact Validity support.
A clear data management strategy will help to improve your CRM data quality and support achieving your desired business outcomes. With a clean CRM database, many businesses achieve better:
- Accuracy in sales forecasts and reporting
- Data privacy compliance
- Targeting for marketing
- Operational efficiency
- Define a duplicate record for your business
- Preventing duplicate records
- Deduplication assessment
- Deduplication methodology
- Merge duplicate records
- Automate deduplication jobs
- What to do next
- Data governance: Developing and implementing data policies and procedures to support business goals.
- Data quality: Accurate, complete, reliable, and actionable data.
- Data standardization: Applying a common and consistent data format.
- Data hygiene: The process of cleaning data to reduce errors and improve data quality.
Define a duplicate record for your business
Duplicate records are a common data problem and may be defined differently based on your business model and operations. Prior to merging duplicate records, we recommend talking with your data governance team to ensure you agree on what your business defines as a duplicate record for each object. Merging records that are not duplicates can lead to missed sales and dissatisfied customers.
For example, a company based in Europe may have a subsidiary in the United States that shares a similar name but is run independently. Each business requires separate invoicing and purchases different products. Some businesses may decide that these two companies are not duplicates due to internal policies and accounting methods. Other businesses may decide they are duplicates.
Preventing duplicate records
Duplicate data entry can be mitigated if you deploy the right tools and processes.
- Align with your data governance team to determine how best to prevent duplicate records based on your definition.
- Prevent duplicates from entering your database with DupeBlocker. DupeBlocker can block, auto-merge, auto-convert, and report duplicates as records are created and updated by end users and integrations.
- Use the Import module for duplicate prevention when importing lists or batches of data into your CRM database.
- Identify sources of duplicate records during regular data assessments and share results with your data governance team. Duplicate records can be prevented by improving staff training.
Deduplication methodology
Before you start deduplicating your data, ensure you have a plan in place to help you manage deduplication activities. A plan can help you split up the amount of work you do into manageable sizes and help ensure you identify and merge all the duplicate records.
Deduplication frequency
Decide on the frequency you need to deduplicate your data. How frequently you deduplicate depends on the rate at which you are introducing new records into your organization as well as the overall size of your database. Determine the frequency based on your needs. No matter what frequency is required, you can automate deduplication activities using DemandTools.
- Daily: If you have hundreds of new records coming in daily from different channels, you probably need to run deduplication jobs daily.
- Weekly or monthly: If you have records that trickle in each week or come in less frequently, then you probably need to run deduplication jobs weekly or monthly.
- Quarterly: Each quarter, schedule a comprehensive data deduplication review and cleanse to ensure all records are deduplicated as expected.
- Deduplicate upon import: DemandTools allows you to deduplicate data during import. Be sure to train staff responsible for importing on deduplication requirements and procedures.
Data may be input or changed manually in your CRM by staff in your sales, marketing, customer success, and finance teams, so you may need to adjust deduplication frequencies even if your data intake rate is low.
Deduplication hierarchy
Focus on deduplicating Account records first as it will make it easier to deduplicate your account’s leads, contacts, and opportunities later. Deduplicate other objects based on business priority, followed by cases and tasks. Talk with your data governance team to help determine priority if needed.
- Accounts
- Leads
- Contacts
- Opportunities
- Other objects
- Cases
- Tasks
Deduplication assessment
Assess your deduplication problems to understand the scope of work and potential sources of duplicate data. We recommend running a full assessment on your data using the Assess module because it will break out your duplicate results by object. Once the assessment is complete, review the duplicate results by object using the Tune or Export modules and prioritize work based on severity. Share any insights on sources of duplicate data with your data governance team as there may be opportunities to improve the data entry process.
If you already ran a full data assessment using the Assess module, login to my.validity.com to review the results.
Finding duplicate records
There are a variety of ways in which records can be entered into a CRM and there won’t always be an exact match of an account name or contact name that tells you the record is a duplicate. Start by looking for an exact match between records because you catch obvious duplicates. For example, the company names below match exactly which gives you a high level of confidence that they are duplicates.
- Account name 1: Company ABC
- Account name 2: Company ABC
After looking for exact matches, look for records that are likely matches based on similar account names, addresses, contact names, phone numbers, or email addresses. For example, the two records below are not exact matches, but are likely duplicates because the second account name is only missing “, Inc”.
- Account name 1: Sample Company, Inc
- Account name 2: Sample Company
You likely realize that trying to find duplicate records by manually reviewing all records and objects is time-consuming and may cause you to miss a lot of duplicate records.
To help identify duplicates, DemandTools provides prebuilt scenarios with different levels of matching logic that follow a similar method mentioned above. You start with Rigid criteria to find the obvious duplicates and work your way down the matching hierarchy using less-rigid criteria to find likely duplicate records. Matching definitions are:
- Rigid: Records match exactly on several fields. Because fields match exactly, you have high confidence they are duplicates.
- Semi-rigid: Records match exactly on a few fields. Records match fewer fields than using the rigid criteria, but you have confidence they are duplicates. Some manual review may be required to confirm they are duplicates.
- Loose: Records may match exactly on only one field and on a few other fields using programmatic matching techniques. You have some confidence that the records are duplicates but they likely require manual review to confirm they are duplicates.
- Very Loose: Records may match using programmatic matching techniques. You have low confidence that the records are duplicates and they require a manual review to confirm they are duplicates.
Merge duplicate records
Start merging duplicate records with the rigid prebuilt scenarios listed below.
- Accounts (Billing) - Rigid Criteria – Recommended First Pass
- Accounts (Shipping) - Rigid Criteria – Recommended First Pass
- Leads – Rigid Criteria – Unconverted
- Contacts Between Accounts – Rigid Criteria
- Contacts Within an Account – Rigid Criteria
- Within an Account – Opportunities by Name and Amount
- Between Accounts – Opportunities by Name and Amount
When you finish deduplicating records with rigid criteria, use DemandTools’ other prebuilt scenarios with less rigid matching criteria to identify and merge other duplicate records that the rigid criteria may have missed.
- Leads – Semi Rigid Criteria – Unconverted
- Leads – Loose Criteria – Unconverted
- Leads – Loose Criteria – Unconverted (first initial)
- Leads – Very Loose Criteria - Unconverted
- Accounts (Billing) – Semi Rigid Criteria
- Accounts (Billing) – Loose Criteria
- Accounts (Billing) – Very Loose Criteria
- Accounts (Shipping) – Semi Rigid Criteria
- Accounts (Shipping) – Loose Criteria
- Accounts (Shipping) – Very Loose Criteria
- Contacts Between Account – Semi Rigid Criteria
- Contacts Between Account – Loose Criteria
- Contacts Between Accounts – Very Loose Criteria (first initial)
- Contacts Within An Account – Semi Rigid Criteria (match blank – phone)
- Contacts Within An Account – Semi Rigid Criteria (match blank – email)
- Contacts Within An Account – Semi Rigid Criteria
- Contacts Within An Account – Loose Criteria (no match on email)
- Contacts Within An Account – Loose Criteria (no match on phone)
- Contacts Within An Account – Very Loose Criteria (first initial)
- Open Cases by Subject and Supplied Email
- Open Tasks Within Account – Rigid Criteria
Automate deduplication jobs
Schedule your deduplication jobs to run automatically at a frequency you define. We recommend testing scenario automations in a sandbox prior to implementing the automations in a production environment.
Data governance team alignment
As you work through deduplicating your data, talk to your data governance team about sources of duplicate data and recommend improvements such as employee training or system enhancements to proactively reduce duplicates.
- Continue cleaning your data with our other data playbooks.