1.3.6.6 Frequency Profiler
The Frequency Profiler examines each attribute and returns the values contained in each attribute, organized by their frequency of occurrence.
The Frequency Profiler is a vital profiling tool used to discover the common and uncommon values in the data. Use the results of frequency profiling to build reference lists of valid and invalid values for each data attribute, for use in validation.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any attributes that you want to analyze for value frequency. |
Options |
None. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
None. |
The Frequency Profiler requires a batch of records to produce its statistics (for example, in order to tell how often values occur in each attribute analyzed). It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.
When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.
The following table describes the statistics for each attribute the Frequency Profiler analyzes. Note that each attribute is shown in a separate tab in the Results Browser.
Statistic | Description |
---|---|
Value |
The value found. |
Count |
The number of times the value occurs in the attribute |
% |
The percentage of records analyzed with the value in the attribute. |
Example
In this example, the Frequency Profiler is run on the Title attribute in a table of Customer records. The following summary view is displayed:
Value | Count | % |
---|---|---|
Mr |
816 |
40.8 |
Ms |
468 |
23.4 |
Mrs |
309 |
15.4 |
Miss |
251 |
12.5 |
[Null] |
139 |
6.9 |
Dr |
15 |
0.7 |
Prof. |
1 |
<0.1 |
Col. |
1 |
<0.1 |
Rev |
1 |
<0.1 |
Sorting the view by the Count column allows you quickly to see the most common and least common values for each attribute analyzed, allowing you to construct Reference Data lists of valid and invalid values.