Please test your deduplication scenarios thoroughly before deduping a large number of records, as there is NO automatic "undo"/"rollback" option and NO restore files are created. If you have a Dynamics sandbox, we recommend that you test in this environment first before deduping your production data.
It is also highly recommended that a full backup of the database be taken before performing a large scale deduplication.
A DedupeMergeData file is created in the DemandToolsMSDData\Restore folder. This file can be used to "partially" restore from a bad merge as it contains all the field data for records prior to the merge. This file ONLY contains field data. NO FILE is created to track which sub-entities were moved during a merge, so any restoration of sub-entities (reparenting back to the non-master records) is still a 100% manual process.
1. Dedupe in the following order
- Account; Contact; Lead; Cases
Just like when merging directly in the Dynamics user interface, all related entities are merged onto the selected master record. Therefore, parent entities should be merged before child (related entities) such that when merging a child entity the parent ID can be used as part of the matching criteria (e.g. when looking for duplicate contacts include the account id in the matching rules to avoid incorrectly matching records when only matching on a few fields).
2. Use a multi-pass strategy for identifying duplicates
- Start with very rigid criteria
- Rigid criteria means matching on at least 4 or 5 fields and using strict mapping types
- Loosen criteria by matching on less fields and choosing less strict mapping types
- Loosen criteria with each pass
- This strategy will help clean up the duplicates in a quicker more manageable manner
3. It is recommended that name, address, phone fields are standardized prior to searching for duplicates
- This is not a requirement as DemandTools has lots of advanced mapping techniques to find matches when these fields are not standardized. However, the cleaner and more complete the data, the easier it will be to find matches.
4. Working with large tables
DemandTools can be used with very large tables to identify duplicate groups, however, multiple passes may be needed to ultimately merge all the duplicates.
There are no hard and fast limits as to how many records can be evaluated for duplicates, but unexpected errors can occur when merging very large amounts of duplicates groups. Users working with large tables should note the following:
- To avoid memory issues finding duplicates or applying a master rule have a minimum of 2 gigabytes of RAM installed on the PC.
- Applying a Master Rule to all records could take a long time
- How long will depend in how many records being applied to, how many conditions are in the rule, if a "Score Delta" being used etc.
- Ensuring that any fields that will be used in a Master Rule are selected on Screen 1 as fields to show on the results grid will speed up the application of the rule
- Avoids the need to go back to all the records in the grid and download additional fields
- Scenarios that used SAVED scenarios with a master rule selected will automatically download all fields used in a master rule
- They are ONLY shown in the grid if selected on Screen 1
Since DemandTools CAN evaluate thousands of records to identify duplicates, once the initial round of clean-up is done, maintenance dupes can typically be run with a few passes as there will likely be smaller amounts of groups returned.
5. Do not attempt to merge too many duplicate groups at once
Exactly what the upper threshold is for the total number of groups that can be merged is unknown, and will vary by organization. Please keep in mind that merges cannot be undone (except manually), therefore, a phased approach to merging is still highly recommended!
If merging results in unexpected errors, then it is recommended that conditions be added in Step 1 to look at smaller subsets of records.
Suggested ways to subset include:
- By state, assuming most states are populated
- Run a pass with all records after doing any subsets to find any accounts that were missed due to state fields being empty
- Account/Company Name "starts with" a,b,c,d,e etc.
- Run a pass with all records after doing any subsets to find ones that may have been missed with the name approach due to common prefixes, e.g. The ABC company and ABC
- Last Name "starts with" a,b,c,d,e etc.