1.3.10.29 Pattern Transform
The Pattern Transform processor provides a simple and user-friendly alternative to the use of regular expressions for transforming the format of data in a string or string array attribute.
Pattern Transform uses a Reference Data set (Patterns Map) with two columns to reformat data matching a number of patterns (in the first 'lookup' column) to a new format or formats (in the second 'map' column).
Use Pattern Transform to standardize the format of data in attributes where data ought to, but does not, conform to a small number of standard formats, for example in Postcode, Account Number, or Product Serial Number attributes.
It is useful to use the Patterns Profiler (with the same Character Pattern Map, that is, the same way of generating character patterns) to find the invalid patterns that may exist in your data. These can then be added to the Patterns Map used in this processor, with each invalid pattern mapped to a valid pattern.
Note that Pattern Transform aims to provide as much flexibility as possible within the confines of using a simple map from one character pattern to another. There are some types of text transformation that will require the extra complexity of regular expressions (see Regex Replace).
The full logic of the processor is as follows:
Step | Action |
---|---|
1 |
Map all characters in the input value to pattern characters using the configured Character Pattern Map, to generate a pattern for the value (such as 'AB1243-ZX' -> 'aaNNNN-aa'). |
2 |
Match the generated pattern against the Lookup column of the Patterns Map.
|
3 |
Where a pattern character appears in the Lookup column but not in the Map column, the underlying character is stripped from the value (for example, using a map of NN-a to NNa, the value 12-A is transformed to 12A) Where a pattern character does not appear in the Lookup column of the map, but does appear in the Map column, it is added as a literal character in the output value (for example, using a map of NNaa to NN-aa, the value 12AB is transformed to 12-AB) Where a pattern character appears in both the Lookup and Map columns of the Patterns Map:
|
Note:
All characters mapping to the same pattern character must always appear in the same order in the final pattern (for example, it is possible to transform AB123CD to 123ABCD or 1ABC23, but NOT to transform AB123CD to BA123CD or AB213CD, using the default Character Pattern Map).
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify a one or more String or String Array attributes from which you wish to replace values using a patterns map. Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output. |
Options |
Specify the following options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
The following data attributes are output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Transformed |
The number of records that were transformed using the Patterns Map. |
Untransformed |
The number of records that were not transformed using the Patterns Map. |
Output Filters
The following output filters are available from the Pattern Transform processor:
-
Records with transformed values
-
Records with untransformed values
Example
In this example, the default Character Pattern Map is used to generate patterns, and the following Patterns Map is used to fix common format problems with UK Postcodes:
Lookup | Map |
---|---|
aaN-_Naa |
aaN Naa |
aaN._Naa |
aaN Naa |
aaNN._Naa |
aaNN Naa |
aaNNaa |
aaN Naa |
aaN__Naa |
aaN Naa |
This transforms values as illustrated below:
Postcode | Postcode.PatternTransformed |
---|---|
OL6 9HX |
OL6 9HX |
CW96HF |
CW9 6HF |
PR7 3RB |
PR7 3RB |
CH7 6DZ |
CH7 6DZ |
CH7 6BD |
CH7 6BD |
CH40BE |
CH4 0BE |
SK87NG |
SK8 7NG |