1.3.6.1 Character Profiler
Use the Character Profiler to discover all the distinct characters that exist in a number of text attributes, and how often they occur.
The Character Profiler is particularly useful to find unexpected characters in text attributes that may need to be checked for on an ongoing basis (using Invalid Character Check), removed (using Denoise) or replaced (using Character Replace). Normalizing character discrepancies is also useful before Parsing.The results are created so that they can easily be added to Reference Data for any of the above purposes.Also, where a source of data contains records from a number of different countries, the Character Profiler can help to understand the ranges of characters in the data.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any String attributes that you want to search for character instances. |
Options |
None. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
None. |
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Character |
The character found in the data. |
Decimal |
The decimal Unicode character reference. Note that a hash character is used to prefix the character references, so that the references can be used directly in Reference Data. |
Hex |
The hexadecimal Unicode character reference. Note that |
Total |
The total number of occurrences of the character across the selected input attributes. |
Record Count |
The number of records containing the character in any of the selected input attributes. |
[Attribute name] Total |
The number of occurrences of the character in the attribute. |
[Attribute name] Record Count |
The number of records containing the character in the attribute. |
Example
For example, the Character Profiler is used to find unusual characters in some multi-language data from a Unicode database. The user chooses to look at the low frequency characters first by sorting the results by the Total column (ascending).
Table 1-120 Character Profiler
Character | Decimal | Hex | Total (asc) |
---|---|---|---|
ñ |
#241 |
#0xF1 |
1 |
ò |
#242 |
#0xF2 |
1 |
ó |
#243 |
#0xF3 |
1 |
ô |
#244 |
#0xF4 |
1 |
õ |
#245 |
#0xF5 |
1 |
ö |
#246 |
#0xF6 |
1 |
ø |
#248 |
#0xF8 |
1 |