Match Transformation: First N Words
The First N Words transformation allows matching to use only the first few (N) words either when clustering or performing comparisons.
Use the First N Words transformation when you are matching on an identifier where there are many words, but where the words towards the end of the value are less useful for matching purposes than the words at the beginning of the value. This is often used when matching company names, such that branch names or other subsidiary words that are appended to a company name are ignored when matching, even though in other cases the same words may be useful for company identification (and therefore not stripped from the value using a Strip Words transformation). For example, to match "Barclays Bank Coventry" with "Barclays Bank Leicester Branch".
The following table describes the configuration options:
Configuration | Description |
---|---|
Options |
Specify the following options:
|
Example configuration
In this example, the First N Words transformation is used within a Character edit distance comparison (see Comparison: Character Edit Distance) to match company names, where values frequently contain extra words not required for matching.
Delimiters Reference Data: *Delimiters
Delimiter characters: None
Number of words: 2
Example transformations
The following table shows examples of transformations using the above configuration:
Table 1-81 Example Transformations for First N Words
Value | Transformed Value |
---|---|
Barclays Bank Plymouth Branch |
Barclays Bank |
Barclays Bank Coventry |
Barclays Bank |
Henkel Loctite |
Henkel Loctite |
Henkel Loctite Adhesives Limited |
Henkel Loctite |
Wingford Confectioners |
Wingford Confectioners |
Wingford Confectioners (in administration) - contact Mr J Alexander |
Wingford Confectioners |