1.3.3.9 List of Matching Transformations
Transformations may be used within match processors both when clustering values, and when comparing values, in order to attain better matching results, by transforming the source values. This allows you to use transformations for matching purposes without the need to configure chains of transformations prior to matching.
A number of transformations may be used, in order, within each cluster configuration or comparison. The transformations must be compatible with the data type of the identifier (though you can also change the data type using a transformation).
The following matching transformations are provided as part of EDQ. These are similar to the main transformation processors, but designed for quick use when clustering or comparing values in a match processor.
Matching Transformations
Transformation | Compatible Identifier Type | Description | Example Transformations |
---|---|---|---|
Absolute Value |
Number, Number Array |
Converts number values into absolute values; that is, converting negative values to positive values, and removing unnecessary digits. |
"-1.5" -> "1.5" "1.5" -> "1.5" "0001908" -> "1908" |
Character Replace |
String, String Array |
Replaces individual characters in a string attribute. |
"é" to "e" |
Convert Date to String |
Date |
Converts date values to Strings, using a date format. |
Using the format dd-MMM-yyyy: "23-Mar-2001 00:00:00" (date) -> "23/03/2001" (String) |
Convert Number to String |
Number |
Converts number values to Strings, using a number format. |
Using the format 0.0: "175.66" (number) -> "175.6" (String) "175.00" (number) -> "175.0" (String) |
Convert String to Date |
String |
Converts String values to dates, using a date format. |
Using the format dd/MM/yyyy: "01/11/2001" (String) -> "01-Nov-2001 00:00:00" (date) "10/04/1975" (String) -> "10-Apr-1975 00:00:00" (date) |
Convert String to Number |
String |
Converts String values to numbers, using a number format. |
Using the format 0.0: "28" (String) -> "28.0" (number) "68.22" (String) -> "68.2" (number) |
Denoise |
String, String Array |
Strips String values of 'noise' characters such as #'<>,/?*%+. |
"Oracle (U.K.)" -> "Oracle UK" "A+D Engineering" -> "AD Engineering" "John#Davison" -> "JohnDavison" "SIMPSON, David" -> "SIMPSON David" |
Deduplicate Date Array |
Date Array |
Deduplicate the dates within an array. |
Input: {Jun 22 2015 10:14:22 AM}{Feb 17, 1986 12:00:00 AM}{Jun 22 2015 10:14:22 AM} Output: {Jun 22 2015 10:14:22 AM}{Feb 17, 1986 12:00:00 AM} |
Deduplicate Number Array |
Number Array |
Deduplicate the numbers within an array. |
Input: {32}{14}{2}{32}Output: {32}{14}{2} |
Deduplicate String Array |
String Array |
Deduplicate the string elements within an array. |
Input: {A}{B}{A}Output: {A}{B} |
First N Characters |
String, String Array |
Strips String values down to the first n characters in the value. |
Where Number of characters = 4: "Simpson" -> "Simp" "Simposn" -> "Simp" "Robertson" -> "Robe" |
First N Words |
String, String Array |
Strips String values down to the first n words in the value. |
Where Number of words = 2: "Barclays Bank (Sheffield)" -> "Barclays Bank" "Balfour Beatty Construction" -> "Balfour Beatty" |
Generate Initials |
String, String Array |
Generates initials from String values. |
Where Ignore words of less than = 4: "IBM" -> "IBM" "International Business Machines" -> "IBM" "Price Waterhouse Coopers" -> "PWC" "PWC" -> "PWC" "Aj Smith" -> "AS" "A j Smith" -> "AJS" |
Last N Words |
String, String Array |
Strips String values down to the last n words in the value. |
Where Number of words = 2: "(Sheffield) Barclays Bank" -> "Barclays Bank" "Balfour Beatty Construction" -> "Beatty Construction" |
Last N Characters |
String, String Array |
Strips String values down to the last n characters in the value. |
Where Number of characters = 5: "01223 421630" ->"21630" "07771 821630"->"21630" "01223 322766"->"22766" |
Lower Case |
String, String Array |
Converts String values into lower case. |
"ORACLE" -> "oracle" "Oracle" -> "oracle" "OraCle" -> "oracle" |
Make Array from String |
String |
Converts a String value into an array of values, where each value in the array forms a separate index key. |
Using comma and space delimiters: "John Simpson" -> "John", "Simpson" "John R Adams" -> "John", "R", "Adams" "Adams, John" -> "Adams", "John" |
Metaphone |
String, String Array |
Generates a metaphone value from a String. |
"John Murray" -> "JNMR" "John Moore" -> "JNMR" "Joan Muir" -> "JNMR" |
Normalize Whitespace |
String, String Array |
Converts all sequences of whitespace characters to a single space. |
"10 Harwood Road" -> "10 Harwood Road" "3 Perse Row" -> "3 Perse Row" |
Replace |
String, String Array |
Standardizes values using a reference data map, for example to standardize common synonyms. |
Where the reference data map contains the appropriate replacements: "Bill" -> "William" "Billy" -> "William" "William" -> "William" |
Round |
Number, Number Array |
Rounds number values to a given number of decimal places. |
Rounding up to two decimal places: "175.853" -> "175.85" "180.658" -> "180.66" |
Round Extra |
Number |
Rounds numbers and outputs multiple rounded values. |
Rounding to the nearest 10, outputting 3 numbers: "45" -> "50", "40, "60" "23" -> "20", "10, "30" |
Script |
Any |
Allows the use of a custom scripted match transformation. |
Transformation determined by the custom script. |
Select Array Element |
Any |
Allows you to select an individual array element from any position in an array, to use when clustering or comparing values. |
"11 Grange Road, Cambridge" -> "Cambridge" |
Soundex |
String, String Array |
Generates a soundex value from a String. |
"Smith" -> "S530" "Snaith" -> "S530" "Clark" -> "C462" "Clarke" -> "C462" "Clarke-Jones" -> "C462" |
Strip Numbers |
String, String Array |
Strips all numbers from a String. |
"CB37XL" -> "CBXL" "7 Harwood Drive" -> " Harwood Drive" "Lemonade 300ML" -> "Lemonade ML" |
Strip Words |
String, String Array |
Strips words from String values, using a reference data list of words. |
Where the reference data list contains company suffixes: "ORACLE CORP" -> "ORACLE" "VODAFONE GROUP PLC" -> "VODAFONE GROUP" "ORACLE CORPORATION" -> "ORACLE" |
Trim Whitespace |
String, String Array |
Strips whitespace (spaces and non-printing characters) from a String. |
"Nigel Lewis" -> "NigelLewis" "Nigel Lewis" -> "NigelLewis" " Nigel Lewis " -> "NigelLewis" |
Upper Case |
String, String Array |
Converts String values into upper case. |
"Oracle" -> "ORACLE" "OraCle" -> "ORACLE" "oracle" -> "ORACLE" |