1.3.3.9.7 Match Transformation: Denoise
The Denoise transformation allows values to be stripped of 'noise' characters - either when clustering or comparing values, in the same way as the main Denoise processor. This increases matching accuracy, as noise characters can detract from the ability to find matching records. For example, the values "Castle (Investments) Ltd" and "Castle Investments Ltd" are a strong match, but without removing the parentheses from the former value, they would have a character edit distance of 2.
Use the Denoise transformation when matching records using an identifier where values were entered using a free text field. Free text fields cause the same data to be entered in many formats, and can also cause typographical errors which may include the insertion of 'noise' characters such as ( and ). The Denoise transformation allows such errors to be overcome when matching.
The following table describes the configuration options:
Configuration | Description |
---|---|
Options |
Specify the following options:
|
Example
In this example, data has been imported from a text file, so all attributes have String types. In Data Type Profiling (see Data Types Profiler), one of the attributes was found to contain number values corresponding to phone number area codes. The data is converted to a Number format when clustering.
Example configuration
In this example, the Denoise transformation is used to strip noise characters from company names when matching. The following noise characters are used:& + ( ) - *
Example transformations
The following table shows examples of transformations using the above configuration:
Table 1-79 Example Transformations for Denoise
Value | Transformed Value |
---|---|
Castle (Investments) Ltd |
Castle Investments Ltd |
Castle Investments Ltd |
Castle Investments Ltd |
Ipswich & Norwich Co-op |
Ipswich Norwich Coop |
Ipswich + Norwich Co-operative |
Ipswich Norwich Cooperative |
Barclays Bank - Cambridge |
Barclays Bank Cambridge |
Barclays Bank (Cambridge) |
Barclays Bank Cambridge |
George & Sons ***in administration*** |
George Sons in administration |