1.3.3.8.4 Comparison: Character Match Percentage
The Character Match Percentage comparison determines how closely two values (String, String Array) match each other by calculating the Character Edit Distance between two String values, and also taking into account the length of the longer or shorter of the two values, by character count.
Use the Character Match Percentage comparison to find matches where values are of varying lengths (such as names), and there might be spelling mistakes in the original values. For example, when matching company names, the values "ABC" and "BBC" have a Character Edit Distance of 1, and might be deemed a close match by other comparisons. However, their Character Match Percentage is only 66%, whereas the Character Match Percentage of "Oracle" and "Oracles", which also have a Character Edit Distance of 1, is 90%, indicating a stronger match.
This comparison supports the use of result bands.
The following table describes the configuration options:
Option | Type | Description | Default Value |
---|---|---|---|
Match No Data pairs? |
Yes/No |
This option determines the result of a comparison when it compares two No Data (Null, or containing only whitespace characters) values for an identifier. If set to No, the comparison will give a 'no data' result when comparing a No Data value against another No Data value. If set to Yes, the comparison will give a full match (a Character Match Percentage of 100%) when comparing a No Data value against another No Data value. A 'no data' result will only be returned if a No Data value is compared against a populated value. |
No |
Ignore case? |
Yes/No |
Sets whether or not to ignore case when comparing values. For example, if case is ignored, "Oracle Corporation" will match "ORACLE CORPORATION" with a Character Match Percentage of 100%. |
Yes |
Relate to shorter input? |
Yes/No |
This option drives the calculation made by the Character Match Percentage comparison. If set to Yes, the result is calculated as the percentage of characters from the shorter of the two inputs (by character count) that match the longer input. If set to No, the result is calculated as the percentage of characters from the longer of the two inputs (by character count) that match the shorter input. |
No |
Example
In this example, the Character Match Percentage comparison is used to match company names. The following options are specified:
-
Match No Data pairs? = No
-
Ignore case? = Yes
-
Relate to shorter input? = No
The following transformations are added:
-
Trim Whitespace, to remove all whitespace from values before comparing them
-
Strip Words, using *Business Suffix Map (which includes the words 'Ltd' and 'Limited')
The following table illustrates some example comparison results using the above configuration:
Table 1-37 Example Results: Character Match Percentage
Value A | Value B | Comparison Result |
---|---|---|
ABC ltd |
ABC limited |
100% |
ABC ltd |
BBC |
66% |
Fast track systems |
Fastrack systems |
93% |
BT |
BTAT |
50% |
Gemini Partners |
Gemmini Partners |
93% |