1.3.6.3 Data Types Profiler
The Data Types Profiler analyzes the content of a number of attributes in order to assess whether or not the values conform to a consistent data type (that is, text, number or date).
Use the Data Types Profiler to gain an understanding of the types of data found in each attribute in your data, to assess whether the type of data is consistent, and in order to find values where the data type may be incorrect - for example because data was entered in the wrong field, or with the wrong type of data type constraint.
The Data Types Profiler looks for three basic types of data:
-
Dates, for any whole values that match a configurable list of date formats
-
Numbers, for any wholly numeric values (such as 12, 56.2, -0.087)
-
Text, for any other values, such as text strings, or a mixture of text and numerals.
Null values are counted separately from the above.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any attributes that you want to analyze for data type consistency. |
Options |
Describes options you can specify. |
List of recognized date formats |
Recognizes dates in a variety of different formats. Specified as Reference Data (Date Formatting Category). Default value is *Date Formats (see Note). |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
None. |
The Date Formats Reference Data used by the Data Type Check must conform to the standard Java 1.6.0 or later SimpleDateFormat API.
To understand how to add Reference Data entries for the correct recognition of dates, see the online Java documentation at http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html
.
Note:
The valid date format yyyyMMdd
, which is included in the date format reference data, is not recognized by this processor. This is because it contains no alpha characters or separators, and so cannot be distinguished from an eight-digit number.
Note:
The Data Types Profiler produces a percentage consistency statistic, which is calculated on the set of records input to the processor. In a real time monitoring process, this set is limited by the configurable commit point on the reader (defined as a number of transactions or as a time limit). If a process with a Data Types Profiler is executed as a real time response process, processing records 1 by 1, this consistency measure will always be 100%.
The following table describes the statistics produced by the profiler. In addition to the number of records analyzed, the following statistics are available in the Results Browser for each attribute:
Statistic | Description |
---|---|
Text |
The number of values that were recognized as having a textual format. |
Date |
The number of values that were recognized as having a date format. |
Number |
The number of values that were recognized as having a number format. |
% Consistency |
A calculation of the consistency of the data types in each attribute - that is, the percentage of values that were recognized as matching the most common data type. |
Examples
In this example, the Data Types Profiler is run on all attributes in a table of Customer records:
Table 1-121 Data Types Profiler Example
Input Field | Total number | Text Format | Numeric Format | Date/time Format | Null values | Consistency % |
---|---|---|---|---|---|---|
CU_ACCOUNT |
2001 |
2000 |
0 |
0 |
1 |
>99.9 |
TITLE |
2001 |
1862 |
0 |
0 |
139 |
93.1 |
NAME |
2001 |
2000 |
0 |
0 |
1 |
>99.9 |
GENDER |
2001 |
1853 |
0 |
0 |
148 |
92.6 |
BUSINESS |
2001 |
1670 |
0 |
0 |
331 |
83.5 |
ADDRESS1 |
2001 |
1999 |
0 |
0 |
2 |
>99.9 |
ADDRESS2 |
2001 |
1922 |
0 |
0 |
79 |
96.1 |
ADDRESS3 |
2001 |
1032 |
0 |
0 |
969 |
51.6 |
POSTCODE |
2001 |
1765 |
0 |
0 |
236 |
88.2 |
|
2001 |
1936 |
0 |
0 |
65 |
96.8 |
ACC_MGR |
2001 |
1996 |
0 |
0 |
5 |
99.8 |
DT_PURCHASED |
2001 |
0 |
0 |
1998 |
3 |
99.9 |
DT_ACC_OPEN |
2001 |
0 |
0 |
1998 |
3 |
99.9 |