1.3.6.13 Record Duplication Profiler
The Record Duplication Profiler allows you to find records that are exact duplicates of one another, based on the selected attributes.
Use the record duplication profiler to check if there are any records in the data set that have been entirely duplicated - for example due to a error in data migration.
As you can select the attributes to use in the duplicate check, you can choose to find records that are duplicates based on a subset of the total record - for example, customer records that are duplicates by name, address, and postcode.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any attributes that you want to use in the duplicate check. |
Options |
Specify the following options:
Records that have Null values in some, but not all, attributes, and which exactly match other records, will always be considered as duplicates. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
The following flags are output:
|
The Record Duplication Profiler assesses duplication across a batch of records. It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.
When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached. The statistics returned will indicate the number of duplicates in the batch of transactions only.
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Duplicated |
The number of records that are duplicated across the attributes analyzed. |
Not duplicated |
The number of records that are not duplicated across the attributes analyzed. |
Example
In this example, the Record Duplication Profiler finds duplicates in a Customers table using two attributes - ADDRESS1 and ADDRESS2.
Duplicated | Not Duplicated |
---|---|
8 |
1993 |
You can drill down on records with Duplicated values:
ADDRESS1 | ADDRESS2 | RecordDuplicate |
---|---|---|
Crescent Road, |
Reading |
Y |
Grange Road, |
North Berwick |
Y |
Grange Road, |
North Berwick |
Y |
Crescent Road, |
Reading |
Y |