Dedupe enables you to identify and merge large quantities of duplicate rows from a .csv or .xlsx file.
- Create and save your deduplication scenarios to use again and again
- Use a broad array of comparison types to more easily identify duplicate rows
- Determine winner rows based on your company's logic
- Control column values on winning rows using robust column rule functionality
Starting a Dedupe Scenario
To begin a new scenario, select Dedupe from the left navigation bar.
To start from an existing scenario you previously saved, select a Dedupe scenario from the Scenarios table on your Dashboard. Your column matching criteria will already be set for you. You can keep all settings or change them as you go.
Step 1: Select File and Match Rows
For Source, choose the file you want to search for duplicates. This file must be in .csv or .xlsx format and under 50 MB.
In the Columns to Compare section, you define what you consider to be duplicate rows. Under the Matching Column dropdown, select the column you want to match on to find duplicates.
Select your Comparison Type next, whether exact or more relaxed. A few examples include:
- Original Value: Use the original string value to compare.
- Cleaned Account Name: Use cleanse rules to match similar names based on common punctuation, abbreviations, and common business prefixes. For example, The ABC Company and ABC Co. would be seen as matching account names. These rules are customizable to meet your organization's specific needs. To customize go to Settings - Manage Cleanse Rules.
- Acronym: Use the first letter from each word in the column value.
- First X Letters: Use only part of the column value by specifying the number of characters to use.
- First X Words: Use only part of the column value by specifying the number of words to use.
- Domain: Strip out all components of the column value except the domain portion.
- Address Cleanse: Normalize the address to ensure better matching between similar address column values.
A comprehensive list of comparison types including use cases and examples can be found here.
Four checkboxes let you adjust the logic for your search.
- Blank Values: Look for positive matches even when column values are blank for a given column. Please note that at least one column must have a non-blank match for a positive match to occur.
- Fuzzy Matching: Analyze words based on their phonetic value, removing vowels and analyzing the remaining consonants.
- Transposed Values: Analyze column values by reordering multiple words within each column value.
- Alpha Clean: Analyze column values by using only characters that are one of the 26 characters of the English alphabet, numbers 0-9, space, and ampersand.
You can continue to narrow down and refine your deduplicate matches by adding more columns. To add more columns, select Add Columns to Compare.
- Matching on multiple columns acts as an AND operation. For example, if you define three column names, a row has to match on all three columns to be considered a duplicate.
To include multiple columns in a match, check the box for Cross-Column Match beside the appropriate row. For example, if you have columns for Mailing Address and Billing Address, you can search across both for address matches.
Select Filter Match Results to add criteria to narrow down the rows in your .csv that you would like to be assessed for duplicates when you move to step 2, Find Duplicates. For example, use Filter Match Results when you have a file with all your account rows, but you only want to find duplicates where Account Type column equals "prospect".
Step 2: Find Duplicates
- As Dedupe prepares your results, you can see how many rows are being processed. If the results set is too large, you can cancel, then go back and narrow your search further.
- Moving to step 2 does not commit changes.
Preview Duplicate Groups and Confirm Credit Usage
Before you can dedupe your selected rows, Dedupe calculates how many rows will be included in this action and shows you two preview rows it has identified as duplicates. From here, you can review your match conditions and go back to adjust as needed (narrowing or broadening your scope, for example).
Dedupe also tells you the number of credits needed to proceed.
- Dedupe credits are pay-as-you-go so you use only what you need.
- 1 duplicate row identified within your file = 1 credit. Credits expire every 60 days.
- Only Admin users can add credits. If you are a Standard or Basic user, contact an Admin to add credits for your use.
Admin Users:
If you have enough credits to complete the deduplication, you will be shown your credit balance after proceeding and the Manage Merge button will be available. You can still add credits at this time.
If you do not have enough credits to complete the deduplication, you will be shown the number of additional credits needed to proceed and will need to select Add Credits.
To add credits, select one of the credit packages and then select Purchase [x] Credits. Your purchase will be charged to the card on file, and you will be returned to the Confirm Credit Usage popup, where you will be shown your new balance.
Standard and Basic users:
If you have enough credits to complete the deduplication, you will be shown your credit balance after proceeding and the Manage Merge button will be available.
If you do not have enough credits to complete the deduplication, you will be shown the number of additional credits needed to proceed and will need to contact your Admin to add credits.
When you are ready, select Manage Merge to go to Step 2. Credits will be applied at this time.
Step 2: Merge Management
In this step, you can review your duplicates, designated by brackets grouping rows together, and select a winner for each bracket.
For small sets of duplicates, you can choose a winner manually. Select the trophy icon under the Winner column for the row you want to keep.
Winner Rules
For larger sets, automate the process with winner rules. No winner rules are initially defined, so you can create rules according to your business's logic and priorities. To start, select Winner Rules. When the Apply Winner Rules popup opens, select Manage Rules, then Add New Rule.
To define a rule:
- Name your rule.
- Choose the columns that will be part of the rule
- Choose a condition for each column.
- Set a value if necessary.
- Give each column a score.
For example, to create a rule for Most Complete Row, pick the columns most critical to your business, such as First Name, Last Name, Company, Phone, and Email. Then set the condition “Is Populated.” No value is needed for this condition, since the module only checks to see if there is a value in the column.
Set a higher score for more critical information. Phone may score a 10 for your business, whereas Email may score only a 5. When DemandTools File picks a winner between duplicate rows, it will look for the row with more critical, highly scored information.
You can also assign a score delta—the difference between two scores—so DemandTools File selects a winner only if the difference is greater than or equal to that delta. This option is selected by default.
When you are done defining rules, select Save and Close, then Close to return to the Apply Winner Rules popup
You can now select a First Rule from the dropdown. Select the rule you just created.
- If you have created several rules and want to apply more than one, select Add another Rule.
Select Apply Winner Rules to return to Merge Management. This will apply winner rules automatically.
- Rules with no score delta will automatically select a winning row for each duplicate group where a winner was not selected, ending any rule sequencing.
Once the rules are applied, winning rows have highlighted trophy icons. The total number of rows in your file, the number of rows selected for deduplication, and the number of winning rows are also displayed.
- You can apply winner rules provisionally and then clear them after applying if the results need to be adjusted. Select Clear Rules beside the Winning Rows count.
- You can also manually override any winner rule selection by selecting and unselecting trophy icons on the appropriate rows.
Merge Options
Setting your Merge Options tells Dedupe how to merge your winning rows into your file.
- Prefix rows: For non-winning rows, add a prefix to your file instead of deleting. To set this option:
- Check the box beside Add Prefix.
- Select the column where you want to append a prefix to the non-winning rows.
- Define the prefix in the text box below. DELETE*** is the suggested prefix, but you must enter it or another prefix into the text box before you can proceed.
Column Rules
Back in Merge Management, select the Use column rules checkbox to override specific column values from your winner rows with column values from the other rows inside each duplicate group and merge them into the winning row.
- For example, you can set your Address columns to Update on Empty, so if your winning row is missing a Street Address, you can pull it from a duplicate row.
To set column rules:
- Check the box and then select Manage.
- Select Add New Rule.
- Name your rule.
- Search for the columns that will be part of the rule.
- Choose a rule override for each column.
- Optionally, you can set a rule override to apply to all displayed columns and check the Combine Column Value box to merge the column values into one.
- Select Save Column Rule and then Close to return to Merge Management.
- Between the Merge Options and Manage buttons is a Select rule override dropdown. Select the column rule you want to use from the dropdown.
Export Options
At any time before or after the merge, select Export Options. Your options differ before and after the merge.
- Before you merge rows, the Grid Data option lets you download your results as displayed, with winner rows selected and duplicate rows intact.
- After you merge rows, the Merged Data option appears. This option lets you download your merged results so you have a file with just the winning rows and column values, plus any original data where no duplicate was identified.
To merge, select Merge selected rows for [x] duplicate groups. This action will consolidate your results into winning rows only, obeying the merge options and column rules you have selected.
Caution: This action cannot be undone.
Export your merged data to complete the Dedupe process.
Once exported, your Merged Data and your Grid Data will be available for download through the Recent Activity table in the Dashboard. Please refer to this article for more details about the availability of your Merged Data and Grid Data files.
Saving Your Scenario
If you want to re-run or schedule this scenario again in the future, open the Scenarios bar slider and select Save As, then name your scenario and select Save.