1.3.6.2 Contained Attributes Profiler
The Contained Attributes Profiler searches records across a number of attributes for pairs of attributes where one attribute value often contains the other attribute's value. A threshold option is used to drive whether or not to relate pairs of attributes together, depending on the percentage of records where one attribute value contains the other.
Use the Contained Attributes Profiler to find attributes which are, or should be, related. Where there is strong attribute linkage, this may indicate a potentially redundant attribute.
Alternatively, attributes may be supposed to be related, but that relationship may be broken; that is, one column value may be blank but could be derived from another column's value.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any attributes that you want to examine for contained attribute linkage. |
Options |
None. |
Contained attribute threshold % |
Controls the percentage of values that must match using Contains matching in two attributes for those two attributes to be considered as related, and to appear in the results. Specified as a percentage. Default value is |
Ignore case? |
Controls whether or not case will be ignored when checking if one attribute value contains another. Specified as |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
None. |
The Contained Attributes Profiler requires a batch of records to produce its statistics; that is, in order to find meaningful relationships between pairs of attributes, it must run to completion. Therefore, its results are not available until the full data set has been processed, and this processor is not suitable for a process that requires a real time response.
When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.
The Contained Attributes Profiler provides a summary view of any pairs of attributes that have a high enough percentage of related values, where one attribute value often contains the other. The following table describes a top-level view showing the following statistics for each pair of related attributes:
Statistic | Description |
---|---|
Contained |
The number of records where the values for both the related attributes were the same. |
Not contained |
The number of records where the values for the related attributes were not the same. |
Click on the Additional Data button to display the above statistics as percentages of the records analyzed.
Drill-down on the number of records where the pair of attributes matched exactly to see a breakdown of the frequency of occurrence of each matching value. Drill-down again to see the records.
Alternatively, drill-down on the number of records where the pair of attributes were not equal to see the records directly. If there should be a relationship between attributes, these will be the records where the relationship is broken.
Example
In this example, a number of attributes are checked for a Contains relationship. A relationship is found between the FirstName
and EmailAddress
attributes, where the FirstName
is often contained in the EmailAddress
. The summary data:
Field 1 | Field 2 | Contained (desc) | Not Contained |
---|---|---|---|
EmailAddress |
FirstName |
1829 |
172 |
Drilling down on the 1829 records where the EmailAddress
contains the FirstName
attribute reveals the following view of all the distinct pairs of records where the relationship was found:
EmailAddress | FirstName | Count |
---|---|---|
LINDA.COOKSON@M-AND-I.COM |
LINDA |
2 |
PAUL.MARKAR@DISCOUNT-FEVER.COM |
PAUL |
2 |
SHEILA.ROBINSON@SUNRISE-HOLIDAYS.COM |
SHEILA |
2 |
NORMAN.SCANLON@ECA.COM |
NORMAN |
2 |
TONY.GIBSON@TOMBURN.COM |
TONY |
2 |
PAULINE.BEEDHAM@BLUEYONDER.CO.UK |
PAULINE |
2 |
ROWLAND.BROWN@BTINTERNET.COM |
ROWLAND |
2 |
JOHN@DARWINS.COM |
JOHN |
2 |
TEST@TEST.COM |
TEST |
2 |
EILEEN_BEARD@WILSONS_PENARTH.COM |
EILEEN |
1 |
BRIGETTE.WALLACE@UNIQUE-INTERIORS.COM |
BRIGETTE |
1 |
MICHAEL.CONNOLLY@GEMINI-VISUALS.COM |
MICHAEL |
1 |
JOYCE.AITKEN@RDM-ELECTRONICS.COM |
JOYCE |
1 |
JOANNA.TEMLETT@BTOPENWORLD.COM |
JOANNA |
1 |
MAHAJAN.DEBELLOTT@NTLWORLD.COM |
MAHAJAN |
1 |