Please test your deduplication scenarios thoroughly before deduping a large number of records, as there is NO automatic "undo"/"rollback" option and NO restore files are created. If you have a Salesforce sandbox, we recommend that you test in this environment first before deduping your production data.
More information on pointing DemandTools to the Salesforce Sandbox can be found HERE.
It is also highly recommended that a full backup of the database be taken before performing a large scale deduplication.
A DedupeMergeData file is created in the DemandToolsData\Restore folder. This file can be used to "partially" restore from a bad merge as it contains all the field data for records prior to the merge. This file ONLY contains field data. NO FILE is created to track which sub-objects were moved during a merge, so any restoration of sub-objects (re-parenting back to the non-master records) is still a 100% manual process.
1. Dedupe in the following order
- Account; Contact; Lead; Lead to Contact (using Lead Conversion module); Lead to Account (using Lead Conversion module); Opportunity; Custom Objects
When merging records in Single Table Dedupe, Salesforce's API merge call is used to perform the merge. Just like when merging directly in the Salesforce user interface, all related objects are merged onto the selected master record. Therefore, parent objects should be merged before child (related objects) such that when merging a child object the parent ID can be used as part of the matching criteria (e.g. when looking for duplicate contacts or opportunities include the account id in the matching rules to avoid incorrectly matching records when only matching on a few fields).
2. Use a multi-pass strategy for identifying duplicates
- Start with very rigid criteria
- Rigid criteria means matching on at least 4 or 5 fields and using strict mapping types
- Loosen criteria by matching on less fields and choosing less strict mapping types
- Loosen criteria with each pass
- This strategy will help clean up the duplicates in a quicker more efficient manner
3. It is recommended that name, address, phone fields are standardized prior to searching for duplicates
- This is not a requirement as DemandTools has lots of advanced mapping techniques to find matches when these fields are not standardized. However, the cleaner and more complete the data, the easier it will be to find matches.
- MassImpact can be used to standardize most of these fields. Please refer to the MassImpact documents for details on how to make basic changes (e.g. find records where state - Texas and change to TX), and how to use prebuilt formulas for more complex standardizations (e.g. "Normalized_US_Address", "NaPhoneFix" etc.).
4. Working with large tables (1+ million records)
DemandTools can be used with very large tables to identify duplicate groups, however, multiple passes may be needed to ultimately merge all the duplicates.
There are no hard and fast limits as to how many records can be evaluated for duplicates, but unexpected errors can occur when merging very large amounts of duplicates groups. Users working with large tables should note the following:
- To minimize API timeouts attempting to download the requested records, increase the "Salesforce Timeout (in minutes)" setting in DemandTools -> Options -> General Settings to 10 minutes
- More information on this setting can be found HERE.
- To avoid memory issues finding duplicates or applying a master rule have a minimum of 2 gigabytes of RAM installed on the PC.
- More information on DemandTools System Requirements can be found HERE.
- Applying a Master Rule to all records could take a long time
- How long will depend in how many records being applied to, how many conditions are in the rule, if a "Score Delta" being used etc.
- Ensuring that any fields that will be used in a Master Rule are selected in Step 1 as fields to show on the results grid will speed up the application of the rule
- Scenarios that used SAVED scenarios with a master rule selected will automatically download all fields used in a master rule
- Ensure there are sufficient Salesforce API calls available to perform the merges
- Merges may need to be done over a period of days if there are lots of duplicate groups and not enough API calls available to process all in a 24 hour period
- Merges can be batched to reduce the number of API calls used
Since DemandTools CAN evaluate millions of records to identify duplicates, once the initial round of clean-up is done, maintenance dupes can typically be run with a few passes as there will likely be smaller amounts of groups returned.
5. Do not attempt to merge too many duplicate groups at once
Merges can be done in batches. This allows for more groups to be merged without encountering unexpected errors. Exactly what the upper threshold is for the total number of groups that can be merged is unknown, and will vary by organization. Please keep in mind that merges cannot be undone (except manually), therefore, a phased approached to merging is still highly recommended!
If merging results in unexpected errors, then it is recommended that conditions be added in Step 1 to look at smaller subsets of records.
Suggested ways to subset include:
- By state, assuming most states are populated
- Run a pass with all records after doing any subsets to find any accounts that were missed due to state fields being empty
- Account/Company Name "starts with" a,b,c,d,e etc.
- Run a pass with all records after doing any subsets to find ones that may have been missed with the name approach due to common prefixes, e.g. The ABC company and ABC
- Last Name "starts with" a,b,c,d,e etc.
6. How long will it take?
The amount of time it takes to merge will depend on if "Use Salesforce Merge" is checked.
Use Salesforce Merge Checked:
Approximately 5 groups per second or 300 groups per minute, based on a batch size of 5. Higher batch sizes will result in faster merge times, but may result in Apex errors. More information on selecting the merge batch size can be found here .
Use Salesforce Merge Unchecked:
Merging will take about 7-10 seconds per group (applies to merging objects other than Accounts, Contacts, Leads where the Salesforce Merge Call and batching of merges is not available)
Note: The time will vary based on how many records are in a group, how much data is being merged from each group, the merge batch size selected, and Salesforce API server traffic/responsiveness.
7. Ensure sufficient API calls are available to perform the merges
The number of API calls used for merging will depend on if "Use Salesforce Merge" is checked. Deduplication will fail if the Salesforce API calls allotted to your organization in a 24 hour period is exceeded. The error message, "REQUEST_LIMIT_EXCEEDED", will appear in the DemandTools log file if this happens.
The total number of API requests for your organization can be accessed in Salesforce by going to Setup->Company Profile->Company Information. On the right you will see "API Requests, Last 24 Hours" giving you the total used and the maximum allowed.
Use Salesforce Merge Checked:
The ability to batch merge calls results is less API calls needed overall to merge multiple groups. A minimum of 1 API call is used per BATCH merged. Using Salesforce's API merge call wraps the majority of the merging process within 1 API call. Higher batch sizes will result in fewer overall API calls used, but may result in Apex errors.
Use Salesforce Merge Unchecked:
Multiple API calls will be needed to merge each group and can be significant when merging multiple groups (applies to merging objects other than Accounts, Contacts, Leads where the Salesforce Merge Call and batching of merges is not available). To estimate how many calls will be needed per groups users can test merging a small number of groups checking the API calls used before and after the merge, then divide that by the number of groups merged.
8. Turn on Salesforce History Tracking
Turning on history tracking for the Account/Contact/Lead objects will track all merges in the History table for those objects when "Use Salesforce Merge" is checked in Step 3:
NOTE: This entry is due to using Salesforce's merge call to perform the merge which is only possible for Accounts/Contacts/Leads. When merging other objects (e.g. Opportunities, custom objects etc.) the merge WILL NOT be tracked in the History tables as the "Use Salesforce Merge" checkbox is not available.
Additionally may want to select any key fields that could be updated during a merge via "Update Fields Where Master Empty" or "Combine Field Values"
- Helpful not only for general tracking purposes but will track the BEFORE image of a field in the event that the merge needs to be undone (no restore files for merging)
- Ideally track changes to Parent ID fields also for any sub-objects that could be moved during the merge (e.g. Account ID changes to the Contact and Opportunity tables)
- Could be used to determine which sub-objects were moved during a merge - again helpful if a merge needs to be undone