1.3.6.5 Equal Attributes Profiler
The Equal Attributes Profiler searches records across a number of attributes for pairs of attributes where values are frequently equal - for example where FirstName
and GivenName
attributes are both stored, and normally the same. A threshold option is used to drive whether or not to relate pairs of attributes together, depending on the percentage of values in each attribute that have the same value.
Use the Equal Attributes Profiler to find possibly redundant attributes, or pairs of attributes where values are normally equal, but in some cases are not. The Equal Attributes Profiler can help find bad data where two values in related attributes do not relate to each other when they should.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any attributes that you want to examine for equal attribute linkage. |
Options |
Describes options you can specify. |
Equal attribute threshold |
Controls the percentage of values that must be equal in two attributes for those two attributes to be considered as related, and to appear in the results. Specified as a percentage. Default is 80%. Note that the value must be between 50% and 100% inclusive. |
Treat nulls as equal? |
Controls whether or not pairs of Null values are considered to be equal, and therefore whether or not they will be considered when appraising the Equal attribute threshold (above). Specified as Yes or No. Default is Yes. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
None. |
The Equal Attributes Profiler requires a batch of records to produce its statistics; that is, in order to find meaningful relationships between pairs of attributes, it must run to completion. Therefore, its results are not available until the full data set has been processed, and this processor is not suitable for a process that requires a real time response.
When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.
The Equal Attributes Profiler provides a summary view of any pairs of attributes that have a high enough percentage of equal values. The following table describes the statistics for each pair of related (equal) attributes:
Statistic | Description |
---|---|
Equal |
.The number of records where the values for both the related attributes were the same. |
Null pairs |
The number of records where the values for both the related attributes were null. Note: If the option to treat nulls as equal is selected, this will be zero, as the null pairs will be included in the Equal statistic. |
Not equal |
The number of records where the values for the related attributes were not the same. |
Click on the Additional Data button to display the above statistics as percentages of the records analyzed.
Drill-down on the number of records where the pair of attributes matched exactly to see a breakdown of the frequency of occurrence of each matching value. Drill-down again to see the records.
Alternatively, drill-down on the number of records where the pair of attributes were not equal to see the records directly. If there should be a relationship between attributes, these will be the records where the relationship is broken.
Example
In this example, a Customer table is analyzed to see if any of its attributes are commonly equal to each other, using the default configuration. The Equal Attributes Profiler finds that the DT_PURCHASED
and DT_ACC_OPEN
attributes are normally equal:
Field 1 | Field 2 | Equal | Null Pairs | Not Equal |
---|---|---|---|---|
DT_PURCHASED |
DT_ACC_OPEN |
1983 |
16 |
11 |
By drilling down on the number of records where the two fields were equal, you can see a view of all the pairs of equal values:
DT_ACC_OPEN | DT_PURCHASED | Count |
---|---|---|
03/02/1997 |
03/02/1997 |
5 |
30/11/1993 |
30/11/1993 |
4 |
09/08/1996 |
09/08/1996 |
4 |
10/09/1993 |
10/09/1993 |
4 |
07/12/1992 |
07/12/1992 |
4 |
07/08/1996 |
07/08/1996 |
4 |
25/05/1993 |
25/05/1993 |
4 |
24/02/1994 |
24/02/1994 |
4 |
21/11/1996 |
21/11/1996 |
4 |
17/12/1996 |
17/12/1996 |
4 |
13/11/1992 |
13/11/1992 |
4 |
27/08/1992 |
27/08/1992 |
4 |
05/10/1992 |
05/10/1992 |
4 |
27/09/1992 |
27/09/1992 |
3 |