List Check
The List Check processor checks the data in an attribute against reference lists of valid and invalid values for the attribute.
The List Check allows case sensitive or insensitive matching, and can match the reference lists in one of a number of different ways:
-
Contains (the value must contain a matching list entry)
-
Whole value (the value must match the list exactly)
-
Starts with (the value must begin with a matching list entry)
-
Ends with (the value must end with a matching list entry)
-
Delimiter match (match the list after splitting the data using the specified delimiters)
Note:
The List Check processor does not support the use of External Reference Data for the valid or invalid values. Attempting to do so will result in an error message during processing.
The List Check is an important processor, used in auditing to find valid and invalid values in a data attribute. Use the results of the Frequency Profiler or Phrase Profiler to create lists of valid and invalid values, and use them in a List Check to audit the data against the lists on an ongoing basis.
List Check allows the use of up to two reference lists - a list of Valid values for the attribute, and a list of Invalid values.
You may choose only to use one of these two lists. For example, if you discover from profiling that there are many different valid values for an attribute, you may want only to check the attribute for invalid values, and consider the non-matching values as either Valid, or Unknown - for example simply to look for suspect words such as 'Test' in a Surname attribute.
If, however, the attribute has a small number of valid values, you may want simply to check the data against a list of valid values, and consider the non-matching values as Invalid, or Unknown - for example to check Title values against a small set of valid titles.
Finally, you can use both lists, and recognize both valid and invalid values, with values that do not match either list categorized as Unknown.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify one or more attributes of any type (string, date, number, string array, date array, number array) that you want to check based on lists of valid or invalid values (or both). |
Options |
Specify the following options for checking against valid values:
Specify the following options for checking against invalid values:
Specify the following match options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
The following flags are output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Valid records |
The records that were categorized as Valid by the List Check. |
Unknown records |
The records that were categorized as Unknown by the List Check. |
Invalid records |
The records that were categorized as Invalid by the List Check. |
You can drill down the results to see the records themselves.
Output Filters
The following output filters are available from a List Check:
-
Valid records
-
Unknown records
-
Invalid records
Example
In this example, the List Check is used to check the values in a Business attribute using lists of valid and invalid Business values collated from Frequency Profiling.
Note that values that do not match either list are categorized as Unknowns.
A summary view:
Valid Records | Unknown Records | Invalid Records |
---|---|---|
1665 |
332 |
4 |
A drill down on unknown records:
Value | Count | % |
---|---|---|
[Null] |
331 |
16.5 |
Fields |
1 |
<0.1 |
A drill down on invalid records:
Value | Count | % |
---|---|---|
Test |
3 |
0.1 |
Test Ltd |
1 |
<0.1 |