3 Using Matching
This chapter includes the following sections:
EDQ-CDS has been designed to match customer data that exhibits real-world variability. All relevant matches in the data set are presented back and appropriately scored according to the likelihood of a match between records. To do this, it uses a variety of different mechanisms, including the application of a wide range of matching algorithms on the data as it is presented, as well as matching techniques on derived forms of the data.
For example, names presented in one writing system are matched both using this writing system and also using a transformed version of the name, providing effective cross-script matching. Similarly, addresses are matched in near raw form (after standardization of international address words and phrases, and after removal of filler words), but also by extracting and matching key information from the address, such as the likely building number, sub-building number, and postal code.
Objectives of Matching
In general, the matching services provided by EDQ-CDS are designed for duplicate prevention, rather than searching. This means that the intention of the out-of-the-box services is to intervene when a record is added to a system if it appears that it may already exist. The implication of this is that the matching services are focused on much more than a single attribute (such as Name) and deliberately do not cast as wide a net as a typical search operation. There may be other records in the system that are not matched but which have similar details, perhaps even exactly the same name, but where the secondary identification information indicates that a match is unlikely. In these cases, EDQ-CDS aims to minimize the additional work for users or data stewards whose role it is to resolve possible matches. This makes the product ideally suited to operate as the data quality protection component of a Master Data Management system, such as Oracle Customer Hub, where the purpose of the services is to link as many records as possible together automatically with as little noise as possible. The same is true for a Customer Relationship Management system, such as Siebel.
Note:
It is possible to change the configuration of EDQ-CDS in order to perform more exhaustive matching. This is mainly designed for use with low volume, high value data sets that do not necessarily offer sufficient secondary information (beyond name fields).
Multiple Locales and Languages
EDQ-CDS has been designed as a multi-locale system, and uses international and culture-sensitive name transcription, transliteration and variant recognition techniques, as well as using international dictionaries when standardizing and matching addresses.
The system is designed to work with international data, and provides international dictionaries of name and address standardizations for this purpose. The international 'Latin script' dictionaries provide coverage of the following 'base' locales, amongst others:
-
United States and Canada
-
United Kingdom
-
France
-
Germany
-
Italy
-
Spain
-
Portugal
-
Brazil
-
Greece
-
Ireland
-
Austria
-
Turkey
-
South Africa
-
Australia and New Zealand
-
Scandinavia
-
Argentina
-
Mexico
In addition to these base locales, EDQ-CDS provides specific optional capabilities for advanced handling of data from the following locales:
-
Arab World (Arabic and Mixed Arabic/Latin)
-
Japan (Kanji, Katakana and Hiragana)
-
China (Simplified and Traditional Chinese)
-
Russia
-
Korea (Hangul)
The set of enabled languages is determined by the configuration of the EDQ-CDS - Initialize Reference Data project, so that the same reference data may be used by any number of EDQ-CDS matching servers. By default, reference data sets for the base locales are pre-initialized in the EDQ server landing area, but these can be easily overwritten either by unzipping cdslists-initialized-full.zip
over these files (to provided coverage for all supported locales and languages) or by configuring and running the Initialization job.
Uses of Matching
The Matching processes included in EDQ-CDS are designed primarily for the following use cases:
-
Duplicate Prevention - uses the Key Generation and Matching web services to prevent duplicate records being entered into applications.
-
Regular Batch Matching for Duplicate Removal - uses the Batch Matching job, run on all, or a subset of, data in an application, and links records together for potential merge.
It is also possible to use the Batch Matching processes as a template for the deduplication of records before they are loaded into a system. This is likely to require additional configuration, and use of EDQ. In such circumstances the best practice is to understand the data before matching using data profiling and audit techniques, such as those available in the EDQ-CDS Data Quality Health Check. In most cases, the set of enabled match rules will need some tuning towards the specifics of the in-scope data in order to provide the optimum balance between performance and effectiveness. It may also be necessary to use EDQ's Match Review application to review possible matches, and construct rules for merging records together.
Note:
EDQ-CDS does not provide any out-of-the-box merging (or survivorship) configuration, because in the two main use cases, merging is performed by the calling application after matches have been identified.
Duplicate Prevention
EDQ-CDS uses stateless web services for duplicate prevention to avoid complex replication and synchronization of large volume customer data. This places the following requirements on the application integrating with EDQ:
-
Storage of Cluster Key tables for each type of record (for example, Contacts or Accounts). These are normally thin tables with two columns - the Primary Key of the record and the Cluster Key. The table must allow for multiple key values per record.
-
Functionality to select and construct candidate records to submit to the Matching service. This involves:
-
Querying the Cluster Key table for the relevant record, and finding all records that share a key value with the driving record.
-
Constructing the data that is required for matching for each of these records.
-
Submitting these Candidate records together with the driving record to the Matching service.
-
Optimum Duplicate Prevention Process Flow
In order to access the full capabilities of EDQ-CDS for duplicate prevention, the integration should work as follows:
-
To prepare the system for real-time duplicate prevention, key values are generated for each record in Batch using the Key Generation process. This can occur either when migrating the data into the application, or as a batch process to generate the key values into the application's Cluster Key tables.
-
When a record is added or updated in the application, the Key Generation service is called in real-time, and returns a number of key values for the record.
-
The application then selects candidate records (those records which share a common key with the driving record) using the existing stored keys and submits them along with the driving record to the Matching service.
-
The Matching service decides which of the candidates are a likely match to the driving record and returns the ids of these records, and a score indicating the strength of match.
-
The application then decides how to consume the matching results; for example, whether to 'auto-match' or present possible matches to the user so that a decision can be made whether or not to continue with inserting a record, or merge it with an existing record.
-
If the record is merged with another record to create a changed master record, an additional call should be made to the Key Generation service in order to re-generate the correct key values before committing the record.
In this model, complex multi-locale EDQ techniques are used to generate the key values and ensure that the right balance between performance and matching effectiveness is maintained, while ensuring that the calling application retains control of data integrity and transactional commits.
Batch Matching
When working with Siebel CRM, Siebel's Data Quality Manager is used to instigate batch jobs and a shared staging database is used to write records for matching and to consume match results. The EDQ-CDS batch matching processes automatically adjust to Siebel's 'Full Match' (match all records against each other) and 'Incremental Match' (match a subset of records against all of their selected candidates) modes.
Match Tuning
In EDQ-CDS matching, it is not necessary to be overly concerned with which identifiers will be populated in the data that is worked with. EDQ-CDS does not use an algorithm that will place unnecessary emphasis on unpopulated data, and so does not require adjustment for this.
Matching works by considering matches on related input attributes separately (such as those relating to name, address, email, etc) and attempting various ways on each of these to find a match. EDQ refers to the grouped matching rules on these logically related attributes as "compound comparisons". It then combines the matches on these compound comparisons to decide how well two records match as a whole. The matching design builds in the knowledge of how strong an identifier is likely to be based on real world principles. Match tuning is normally a matter of performing one of the following tasks:
-
Adjusting the weighting of a compound comparison
-
Enabling or disabling a compound comparison
-
Adjusting the key generation configuration.
-
Enabling or disabling a provided rule
-
Adjusting the score for a specific rule within a compound comparison
-
Inserting a new rule into a compound comparison (perhaps a stronger or weaker version of an existing rule)
Note:
Even when inserting new rule configurations, it may well be possible to use existing comparisons and comparison results rather than adding new comparisons, though both are possible.
Output of Match Metadata
The output of match metadata provides granular details about why two records matched, providing information about which compound comparisons contributed to a match. The following EDQ match metadata is output for each compound comparison (for example, Name, Address, Email, Phone, etc.):
-
[Compound Comparison] Result, for example,
N040 Given name abbreviated
-
Score (out of 100)
-
Category (Exact, Fuzzy, No data or Conflict)
Using Key Generation
Key Generation is used to minimize the work that is performed during the final stage of matching. It works by splitting the records into tranches (clusters), based on similarities in significant data fields. Only subsets of the data which share similar characteristics (and will therefore be placed in the same cluster) will be compared on a record-by-record basis during matching.
If loose clusters are used, there will be a large number of records in each cluster. This means that there is a reduced risk that true matches will be missed, but also that a greater amount of processing will be required to compare all the key generated records. A tighter key generation strategy will result in smaller groups and hence a reduced processing time, but will increase the likelihood that some true matches will not be detected.
EDQ-CDS is supplied with a number of different key method algorithms for individual, entity, and address data that use different combinations of key data fields in their construction. Each key method algorithm has been assigned a unique prefix code for easy identification, and to ensure keys from different key methods are not identical.
Legacy Clustering
Prior to release 12.2.1 key generation was referred to as clustering and the functionality provided was a much more restricted version of current key generation, although the principles were the same. Only three methods of "clustering" were provided with no easy scope for customization.
These "legacy" methods can still be used by setting the following in the run profile:
phase.*.process.*.uselegacykeygen = Y
and the levels set using
phase.Individual\ Keygen.process.*.clusterlevel = [1/2/3]
Structure of Key Methods
For each party type, key methods are grouped into key groups and key types.
For example, the individual 'Name Phone' key group contains all the key methods constructed using a combination of the name and phone attributes. Within this group, there are two key types:
FNMGNMPNR: key methods based on family name metaphone, given name metaphone and phone number rightmost characters
FNMPNL: key methods based on family name metaphone and phone number leftmost characters
Each key type then consists of one or more actual key methods, each one using varying lengths of the metaphone or leftmost/rightmost characters.
For example, the FNMPNL key type contains the following key methods:
FNM4PNL6
: family name metaphone first 4, phone number last 6
FNM4PNL7
: family name metaphone first 4, phone number last 7
FNM4PNL8
: family name metaphone first 4, phone number last 8
These are categorized as 'strict', 'typical' and 'loose' respectively, as the length of the phone number substring used becomes larger and therefore provides a tighter key.
Key values generated using the last of these methods would take the form:
FNM4PNL8^MN^65065421
An automatic or 'encoded' key profile consists of a pipe-delimited list of key methods with their associated key priorities, for example:
AD112FNL5GNL5^10|GNW1FNL0^11|AD17AD25CTL10^12|FNM4PNL8^13|PNR6^14
Note that the key priorities are only relative within a specific profile, they have no intrinsic meaning.
Keys for Custom Attributes
Keys for custom attributes can optionally be created during Key Generation (by default keys are not generated for custom attributes).
This is specified in the run profile as follows:
phase.*.process.*.customstringNkey = Y
phase.*.process.*.customdateNkey = Y
and can be overridden in real-time on a per-message basis as follows:
<dn:request customstringNkey="Y" customdateNkey="Y">
The actual keying method used depends on the key profile specified:
-
Strict profiles key custom strings on the full string, and custom dates on the full date
-
Loose profiles key custom strings on the metaphone of the string, and custom dates on the year only
-
Typical profiles key custom strings on the first 10 characters of the string, and custom dates on the year and month
Note that custom attributes are ignored if legacy cluster levels are used.
Key Method Analysis
Key method analysis introduces the capability within CDS to automatically analyze the customer's data and determine the best key profile for that particular data set. Key analysis consists of these main steps:
- Generate key values for the data using all available key methods.
- Profile, score and rank those key values using various statistical mechanisms such as high frequency key values and distribution/diversity of key values.
- Construct and output a recommended key profile by selecting the best key method(s) within each key group.
Custom attributes will be taken account of during Key Analysis if they are enabled for Key Generation, as described in Keys for Custom Attributes.
All available custom attribute key methods are analyzed, in a similar fashion to the existing fixed attributes.
Running Batch Key Analysis
There are several new staging tables for Key Analysis which must be created prior to running the job. The SQL commands to create these tables are added to the existing default script, edq_staging_tables.sql
, which is delivered with EDQ and installed under <middleware_home>/edq/oracle.edq/scripts/cds
.
The batch jobs for running Key Analysis are:
-
Batch Individual Key Analysis
-
Batch Entity Key Analysis
-
Batch Address Key Analysis
These jobs are structured in a similar fashion to the existing batch jobs for key generation and matching, in that they expect to receive party data in the relevant Candidates tables in the staging schema, and output their results to tables in the same schema.
Note that due to the statistical nature of the way Key Analysis works, it expects to always receive the full set of customer data to analyze. Whilst the jobs will actually run with a sample of the data, the results will only apply to that sample and cannot be scaled up to the full dataset.
The following run profile parameter must be set to Y
in order for Key Analysis to run successfully:
phase.Key\ Analysis.process.*.generateallkeys = Y
Note also that the run profile contains various new SQL statements for Key Analysis, in order to expose the SERVERID
and JOBID
parameters in a similar fashion to the existing staging tables. Therefore these parameters will also need updating in the run profile in-line with any changes for the other table parameters.
Key Method Analysis Outputs
Key Analysis outputs results in the following staging tables:
EDQCDS_KEY_ANALYSIS_PROFILE
This table contains a single row per job, which simply contains the recommended key profile, consisting of a pipe-delimited list of key methods with their associated key priorities, for example:
AD112FNL5GNL5^10|GNW1FNL0^11|AD17AD25CTL10^12|FNM4PNL8^13|PNR6^14
This is the profile that should be used for key generation and matching, if the user decides to accept the recommendation.
Note:
Key Analysis does not actually output the key values for the recommended profile; this must be done separately by running the relevant batch key generation job and passing in the recommended profile accordingly.
EDQCDS_KEY_ANALYSIS_REPORT
This table contains a single row per key method analyzed, detailing the statistics and score for each method together with an indication of whether it was selected for the profile and if so the assigned priority. Only those key methods generated are listed - that is, those for which the party data contained relevant non-blank attributes.
This report is provided mainly for support and diagnostics purposes.
EDQCDS_KEY_ANALYSIS_TOP_VALUES
This table contains the top 20 key values by count per key method analyzed. Only those key methods generated are listed - that is, those for which the party data contained relevant non-blank attributes.
This report can help users identify potential DQ issues with their data, i.e. very large key value counts may indicate spikes and generic data values such as '000000' phone numbers or 'sales@' email addresses.
Individual Key Types
Key methods for matching individual data are based on the following key types:
Prefix | Cluster Name | Level | Description |
---|---|---|---|
|
Family Name Meta, Postal Code |
1 |
4-character double-metaphone of the surname + First 5 characters of the postal code + First 3 characters of address1. Note: With matching services, leading zeroes are stripped only on numeric |
|
Phone last N |
1 |
Last N digits of the phone/fax/work/mobile number; set to 6. |
|
Email first 9 |
1 |
First 9 characters of the email address. |
|
Tax Number |
1 |
First 10 characters of the tax number. |
|
Elimination Identifier |
1 |
All non-alphanumeric characters are removed. |
|
Unique Identifier |
1 |
All non-alphanumeric characters are removed. |
|
National Identifier |
1 |
First 10 characters of the National ID number. |
|
Given Names standardized, Family Name, Postal Code |
2 |
First character of the standardized given name + First 3 characters of the family name + First 5 characters of the postal code. |
|
Given Names standardized, Family Name, City |
2 |
First 3 characters of the standardized given name + First 3 characters of the family name + First 10 characters of the city name. |
|
Given Names standardized, Address1 |
2 |
First 3 characters of the standardized given name + First 10 characters of address line 1. |
|
Family Name Meta, First Company word |
2 |
First 4 characters of the family name + First word of the account name. |
|
Address1, Address2, City |
3 |
First 5 characters of address line 1 + First 5 characters of address line 2 + First 5 characters of the city name. |
|
Original Script name, Postal Code |
3 |
First 4 characters of the original script name + First 4 characters of the postal code. |
|
Full Name Meta |
3 |
The full name tokens are sorted and then the double-metaphone algorithm is applied to generate tokens of up to 3 characters in length. For each ordered pair of tokens, a cluster value is generated that is the concatenation of the two metaphone tokens. |
Table 3-1 Address Only
Key Type | Description |
---|---|
AD1AD2CTL |
address1 distilled (No whitespace, leftmost chars), address2 distilled (No whitespace, leftmost chars), city standardized (No whitespace, leftmost chars) |
ADACTLPRE |
adminarea standardized (No whitespace, leftmost chars), city standardized (No whitespace, leftmost chars), premise derived (Denoised, no whitespace, leftmost chars) |
Table 3-2 Name and Company
Key Type | Description |
---|---|
ANLGNLFNL |
accountname (No whitespace, leftmost chars), givenname standardized (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars) |
ANWFNMGNL |
accountname (Leftmost words), familyname (Double metaphone, leftmost chars), givenname standardized (No whitespace, leftmost chars) |
ANWFNM |
accountname (Leftmost words), familyname (Double metaphone, leftmost chars), |
ANMGNLFNL |
accountname (First word, double metaphone, leftmost chars), givenname standardized (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars) |
Table 3-3 Name and DOB
Key Type | Description |
---|---|
DBYGNLFNL |
DOB standardized (Year), givenname standardized (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars) |
DBXGNLFNL |
DOB standardized (Full date), givenname standardized (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars) |
DBNGNLFNL |
DOB standardized (Year and month), givenname standardized (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars) |
Table 3-4 Name Only
Key Type | Description |
---|---|
FMP |
fullname standardized (Array of tokens, metaphone pairs, leftmost chars) |
GNWFNL |
givenname standardized (Leftmost words), familynamenormalized (No whitespace, leftmost chars) |
Table 3-5 Name and Phone
Key Type | Description |
---|---|
FNMGNMPNR |
familyname (Double metaphone, leftmost chars), givenname standardized (First word, double metaphone, leftmost chars), phonenumbers standardized (Rightmost chars (array of)) |
FNMPNL |
familyname (Double metaphone, leftmost chars), phonenumbers standardized (Leftmost chars (array of)) |
Table 3-6 Full Name and Address
Key Type | Description |
---|---|
AD1FNLGNL |
address1 distilled (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars), givenname standardized (No whitespace, leftmost chars) |
FNLGNLPCL |
familyname (No whitespace, leftmost chars), givenname standardized (No whitespace, leftmost chars), postalcode standardized (No whitespace, leftmost chars) |
CTLFNLGNL |
city standardized (No whitespace, leftmost chars), familyname (No whitespace, leftmost chars), givenname standardized (No whitespace, leftmost chars) |
Table 3-7 Household Address
Key Type | Description |
---|---|
AD1FNMPCL |
address1 distilled (No whitespace, leftmost chars), familyname (Double metaphone, leftmost chars), postalcode standardized (No whitespace, leftmost chars) |
AD1FNMCTL |
address1 distilled (No whitespace, leftmost chars), familyname (Double metaphone, leftmost chars), city standardized (No whitespace, leftmost chars) |
Table 3-8 National ID
Key Type | Description |
---|---|
NIL |
nationalidnumber standardized (Leftmost chars (array of)) |
NIP |
nationalidnumber standardized (Pairs of leftmost & rightmost chars (array of)) |
Table 3-9 Phone
Key Type | Description |
---|---|
PNR |
phonenumbers standardized (Rightmost chars (array of)) |
Table 3-10 Script Name
Key Type | Description |
---|---|
OSLPCL |
scriptfullname (No whitespace, leftmost chars), postalcode standardized (No whitespace, leftmost chars), () |
Table 3-11 Tax Number
Key Type | Description |
---|---|
TNL |
taxnumber standardized (Leftmost chars (array of)) |
TNP |
taxnumber standardized (Pairs of leftmost & rightmost chars (array of)) |
Table 3-12 UID
Key Type | Description |
---|---|
UID(1/2/3) |
uid[1, 2, 3]standardized (Leftmost chars (array of)) |
Table 3-13 Custom Strings
Key Type | Description |
---|---|
CM[1-6] |
customstring[1-6] standardized (Double metaphone, leftmost chars, if blank 8 leftmost chars (no metaphone)) |
CL[1-6] |
customstring[1-6] standardized (No whitespace, leftmost chars) |
Table 3-14 Custom Dates
Key Type | Description |
---|---|
CY[1-6] |
customdate[1-6 ]standardized (Year) |
CX[1-6] |
customdate[1-6] standardized (Full date) |
CN[1-6] |
customdate[1-6] standardized (Year and month) |
Note:
The key method algorithms use data attributes that have been normalized (for example, converted to upper case and symbols stripped) and have had whitespace removed. This allows key generation and matching to be performed in a case-insensitive manner and to be tolerant of the spacing within attributes.
Examples
The following record data is used to provide examples of the key values that are generated by the individual key method algorithms:
Attribute | Value |
---|---|
|
Jim |
|
Frederick |
|
Smith |
|
077777 123456 |
|
jsmith@mymail.com |
|
888666444 |
|
Acme Ltd |
|
14 high St |
|
Cambridge |
|
CB1 2AB |
|
00021-53563 |
|
gbr0008873323 |
|
AB 12 34 56 C |
The key values that are generated using the Typical key profile are as follows:
Key Type | Key Method | Priority | Cluster Values |
---|---|---|---|
|
UI10 |
1 |
UI10^0002153563 |
|
AD110FNL3GNL3 |
42 |
AD110FNL3GNL3^14HIGH^SMI^JAM |
|
AD12FNM3PCL5 |
55 |
AD12FNM3PCL5^14^SM0^CB12A |
|
AD17AD25CTL5 |
59 |
AD17AD25CTL5^14HIGH^^CAMBR |
|
ANW1FNM4 |
54 |
ANW1FNM4^ACME^SM0 |
|
CTL10FNL3GNL3 |
51 |
CTL10FNL3GNL3^CAMBRIDGE^SMI^JAM |
ENP |
ENP15 |
40 |
ENP15^JSMITHMYMAILCOM |
FNLGNLPCL |
FNL3GNL1PCL5 |
44 |
FNL3GNL1PCL5^SMI^J^CB12A |
FNMPNL |
FNM4PNL7 |
46 |
FNM4PNL7^SM0^0777771 |
NIL |
NIL10 |
36 |
NIL10^AB123456C |
PNR |
PNR6 |
47 |
PNR6^123456 |
TNL |
TNL1 |
37 |
TNL10^888666444 |
Entity Key Types
The following key types are provided for matching entity data:
Table 3-15 Name Address
Key Type | Description |
---|---|
AD1EMTPCL |
address1 distilled (No whitespace, leftmost chars), entityname distilled (Array of tokens, double metaphone, leftmost chars), postalcode standardized (No whitespace, leftmost chars) |
ENLPCL |
entityname distilled/normalized (No whitespace, leftmost chars), postalcode standardized (No whitespace, leftmost chars), |
FANENLCTL |
fulladdress distilled (No whitespace, no numbers, denoised, leftmost chars), entityname distilled/normalized (No whitespace, leftmost chars), city standardized (No whitespace, leftmost chars) |
AD1ENLPCL |
address1 distilled (No whitespace, leftmost chars), entityname distilled/normalized (No whitespace, leftmost chars), postalcode standardized (No whitespace, leftmost chars) |
Table 3-16 Name Metaphone Address
Key Type | Description |
---|---|
CTLFALNSM |
city standardized (No whitespace, leftmost chars), fulladdress distilled (No whitespace, leftmost chars), fullname distilled/normalized (Double metaphone, leftmost chars) |
FALNSM |
fulladdress distilled (No whitespace, leftmost chars), fullnamedistilled/normalized (Double metaphone, leftmost chars), |
CTLNSM |
city standardized (No whitespace, leftmost chars), fullname distilled/normalized (Double metaphone, leftmost chars), |
Table 3-17 Name only
Key Type | Description |
---|---|
NSL |
fullname distilled (No whitespace, leftmost chars) |
ENMSNM |
entityname distilled (Double metaphone, leftmost chars), entitysubname distilled (Double metaphone, leftmost chars) |
FMT |
fullname distilled (Array of tokens, double metaphone, leftmost chars) |
Table 3-18 Name City Phone
Key Type | Description |
---|---|
CTLENLPNR |
city standardized (No whitespace, leftmost chars), entityname distilled/normalized (No whitespace, leftmost chars), phonenumbers standardized (Rightmost chars (array of)) |
CTLENLPNL |
city standardized (No whitespace, leftmost chars), entityname distilled/normalized (No whitespace, leftmost chars), phonenumbers standardized (Leftmost chars (array of)) |
Table 3-19 Phone
Key Type | Description |
---|---|
PNR |
phonenumbers standardized (Rightmost chars (array of)) |
Table 3-20 Website
Key Type | Description |
---|---|
WSL |
websitestem (Leftmost chars (array of)) |
Table 3-21 Script Name
Key Type | Description |
---|---|
OSL |
script fullname (Array of tokens, leftmost chars) |
Table 3-22 VAT number
Key Type | Description |
---|---|
VNL |
vatnumber standardized (Leftmost chars (array of)) |
VNP |
vatnumber standardized (Pairs of leftmost & rightmost chars (array of)) |
Table 3-23 Tax Number
Key Type | Description |
---|---|
TNL |
taxnumber standardized (Leftmost chars (array of)) |
TNP |
taxnumber standardized (Pairs of leftmost & rightmost chars (array of)) |
Table 3-24 UID
Key Type | Description |
---|---|
UID[1,2,3] |
uid[1, 2, 3]standardized (Leftmost chars (array of)) |
Table 3-25 Custom Strings
Key Type | Description |
---|---|
CM[1-6] |
customstring[1-6] standardized (Double metaphone, leftmost chars, if blank 8 leftmost chars (no metaphone)) |
CL[1-6] |
customstring[1-6] standardized (No whitespace, leftmost chars) |
Table 3-26 Custom Dates
Key Type | Description |
---|---|
CY[1-6] |
customdate[1-6 ]standardized (Year) |
CX[1-6] |
customdate[1-6] standardized (Full date) |
CN[1-6] |
customdate[1-6] standardized (Year and month) |
Note:
The key method algorithms use data attributes that have been normalized (for example, converted to upper case and symbols stripped) and whitespace removed. This allows key generation and matching to be performed in a case-insensitive manner and be tolerant to the spacing within attributes.
Examples
The following record data is used to provide examples of the key values that are generated by the entity key method algorithms:
Attribute | Value |
---|---|
|
Oracle UK |
|
Cambridge |
|
+441223228400 |
|
http://www.oracle.com/uk |
|
RGW432D243224 |
|
999111 |
|
296 Cambridge Science Park |
|
Cambridge |
|
CB4 0WD |
|
00021-53563 |
|
gbr0008873323 |
The following key values are generated using a key profile of Typical:
Key Type | Key Method | Priority | Key values |
---|---|---|---|
|
|
43 |
AD13PCL4^296^CB40 |
|
|
41 |
AD14EMT4PCL3^296C^ARKL^CB4 |
|
|
49 |
CTL0NSM6^CAMBRIDGE^ARKLKM |
|
|
47 |
CTL1ENL1PNL7^C^O^4412232 |
|
|
42 |
ENL4PCL3^ORAC^CB4 |
|
|
39 |
FAL10NSM4^296CAMBRID^ARKL NSL25^ORACLECAMBRIDGE |
|
|
40 |
NSL25^ORACLECAMBRIDGE |
|
PNR6 |
58 |
PNR6^228400 |
|
|
35 |
TNL10^RGW432D243 |
|
|
1 |
UI10^0002153563 |
|
|
36 |
VNL10^999111 |
|
|
57 |
WSL8^ORACLE |
Address Key Types
The following key method types are provided for matching address data:
Table 3-27 Address Lines
Key Type | Description |
---|---|
AD1AD2 |
address 1 distilled (No whitespace, leftmost chars), address 2 distilled (No whitespace, leftmost chars) |
Table 3-28 Address City
Key Type | Description |
---|---|
AD1CTL |
address 1 distilled (No whitespace, leftmost chars), citystandardized (No whitespace, leftmost chars) |
CTLPCLPRE |
citystandardized (No whitespace, leftmost chars), postalcodestandardized (No whitespace, leftmost chars), premisederived (Denoised, no whitespace, leftmost chars) |
PMSPCC |
premisederived/address 1 distilled (First number word of premisederived/ premise leftmost chars/first number word of address1distilled/left most chars address1 distilled), postalcodestandardized/citystandardized (Leftmost chars of postalcode standardized/leftmost chars of city standardized), |
Table 3-29 Full Address
Key Type | Description |
---|---|
FAL |
fulladdress distilled (No whitespace, leftmost chars) |
FAN |
fulladdress distilled (No whitespace, no numbers, denoised, leftmost chars) |
Table 3-30 Postal Code
Key Type | Description |
---|---|
PCL |
postalcode standardized (No whitespace, leftmost chars) |
Note:
-
A Number word is a word with one or more numbers within it. for example, 234 and 2A are both number words.
-
The key method algorithms use data attributes that have been normalized (for example, converted to upper case and symbols stripped) and whitespace removed. This allows key generation and matching to be performed in a case-insensitive manner and be tolerant to the spacing within attributes.
Examples
The following record data is used to provide examples of the key values that are generated by the address key method algorithms:
Attribute | Value |
---|---|
|
2529 CINCINNATI ST |
|
APT 6 |
|
LOS ANGELES |
|
CA |
|
90033 |
Note:
During Key generation, ST
is distilled out of the address1
field, and APT
is distilled out of the address2
field. This is because they are common addressing components that are less important identifiers than the remainder of the address line, and removing them produces more accurate clusters.
The Key values that are generated using the Typical address key profile are:
Key Type | Key Method | Priority | Key Values |
---|---|---|---|
AD1AD2 |
AD110AD210 |
12 |
AD110AD210^2529CINCIN^6 |
AD1CTL |
AD15CTL8 |
9 |
|
CTLPCLPRE |
CTL8PCL5PRE0 |
10 |
CTL8PCL5PRE0^LOSANGEL^90033^2529 |
FAL |
FAL10 |
11 |
|
FAN |
FAN10 |
13 |
FAN10^CINCINNATI |
PCL |
PCL0 |
15 |
PCL0^90033 |
PMSPCC |
PMS6PCC5 |
8 |
PMS6PCC5^2529^90033 |
Using Individual Matching
The matching design for individuals in CDS is based on combining matches between several logical identifiers (compound comparisons). These compound comparisons are:
-
Name
-
Address
-
Account name
-
DOB
-
Phone number
-
Email
-
National ID number
-
Tax number
It is also possible to enable matching of the custom fields (however, these are not enabled by default)
EDQ-CDS uses preconfigured match rules on the compound comparisons to ascertain how well two records match (or don't match) on that particular logical identifier.
In order to determine whether two records as a whole match, EDQ-CDS uses the results for the matching on the logical identifiers and combines them to produce an overall score that gives a measure of how well the records match. Note that a conflict will negatively affect a score, as well as a match increasing it. For example, two records with an exact match on name and address, but a conflicting date of birth will score lower than a two records with an exact match on name and address, but no date of birth.
Each logical identifier has a default weighting, defining how likely two records with matches on the compound comparison related to this logical identifier.
Matching on the Individual Name logical identifier
The rules for matching on the individual name compound comparisons include the use of pre-matching transformations and various matching comparisons in order to handle the following types of variance between different representations of what may be the same individual name:
-
Names written in different writing systems/scripts, for example, 'Зоран' and 'Zoran'.
-
Variants of the same name, for example, 'Bill' and 'William'.
-
Different levels of name completeness, for example, 'Joseph Andrew Harris' and 'Joseph Harris'.
-
Name tokens in a different order, for example, 'Lacazette Jacques' and 'Jacques Lacazette'.
-
Abbreviated forms of names, for example, 'Chris' and 'Christian'.
-
Typographic differences, for example, 'Michael' and 'Micheal'.
-
The use of initials, for example, 'A M' and 'Alexander Martin'.
-
Changes of surname due to marriage, for example, 'Paula Jones' and 'Paula Lewis' at the same address.
-
Various combinations of the above types of variance.
Note:
In this table the pipe character is used to indicate a separator between the input given name and family name attributes (for example, Given Name= Martin, Family Name=Smith is written as 'Martin|Smith'). Where no pipe character is used, this means the Full Name is used in the match rule.
Note:
Near the top of this list are some conflict name rules, these are designed to negatively weight matches between two names that are obviously different genders, to avoid matches of this type.
Name Matching Rules | Example Name Match | Type |
---|---|---|
Script full name exact |
|
Exact |
Name exact |
Martin|Fox = Martin|Fox |
Exact |
Standardized given name |
Bill|Lewis = William|Lewis |
Exact |
Given name abbreviated |
Chris|Smith = Christina|Smith |
Fuzzy |
Name conflict, supplied gender different |
Paula|Smith - Paul Smith (negatively weighted to eliminate matches such as this) |
Conflict |
Name conflict, derived gender different |
Paula|Smith - Paul Smith (negatively weighted to eliminate matches such as this) |
Conflict |
Standardized given name abbreviated |
Abell|Hernandez = Abelson|Hernandez |
Fuzzy |
Script full name any order |
|
Fuzzy |
Given name similar and sounds like |
Yngrid|Martin = Ingrid|Martin |
Fuzzy |
First name similar and sounds like |
Yngrid Elisabeth|Martin = Ingrid Martin |
Fuzzy |
Additional given names |
Michael John|Smith = John|Smith |
Fuzzy |
Standardized full name |
Mehmood Mahomed = Mahmoud Mohammed |
Fuzzy |
Script full name has additional names |
|
Fuzzy |
Additional names |
Mary Jones Steward = Mary Jones |
Fuzzy |
Script full name typos |
|
Fuzzy |
Standardized given name abbreviated; family name typos |
Abell|Hernandez = Abelson|Hernandes |
Fuzzy |
Full name typos, all words |
Mary Cloire Jonez = Mary Claire Jones |
Fuzzy |
First name first three; family name typos |
Ros Susan|Jonez = Rose Susan|Jones |
Fuzzy |
Full name initials in order; additional names |
G A|Smith = Gordon Alfred|Smith |
Fuzzy |
Standardized first name only; female |
Jacklin|Jones = Jacqueline|Smith |
Fuzzy |
Matching on the other logical identifiers
Addresses
The rules for matching on the address compound comparison within individual name matching include the use of pre-matching transformations and various matching comparisons in order to handle the following types of variance between different representations of what may be the same address:
-
Extracting the premise and subpremise
-
Standardizing commonly used words such as STREET, ROAD, etc.
-
Stripping commonly used words such as STREET, ROAD, etc.
-
Typographic differences
Note:
In this table the pipe character is used to indicate a separator between the inputs of address1, address2, address3, city, adminarea, postalcode. For example address1=296 Cambridge Science Park address2= Milton Road address3=<blank>, city=Cambridge adminarea = <blank> postalcode=CB4 0WD is represented as 296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD
Table 3-31 Matching on other logical identifiers
Address Rule Name | Example | Type |
---|---|---|
Address exact |
296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD = 296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD |
Exact |
Premise, subpremise, address similar, postal code |
Flat 1|296 Cambridge Science Park||Cambridge||CB4 0WD = Flat 1|296 Cambridge Sci Park||Cambridge||CB4 0WD |
Fuzzy |
Premise, no subpremise, address similar, postal code |
296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD = 296 Cambridge Sci Park|Milton Road||Cambridge||CB4 0WD |
Fuzzy |
Address1 and address2 distilled exact, postal code starts with |
296 Milton Road|||Cambridge||CB4 0WD = 296 Milton Road|||||CB4 0WD |
Fuzzy |
Address1 distilled exact, address2 no conflict, postal code starts with |
296 Milton Road|Science Park||Cambridge||CB4 0WD = 296 Milton Road|||||CB4 0WD |
Fuzzy |
Premise, subpremise, postal code starts with |
Flat 1|352 Milton Road||Cambridge||CB4 0WD = 352 Milton Road|Flat 1||||CB4 0WD |
Fuzzy |
Premise, no subpremise, postal code starts with |
296 Cambridge Science Park|||Cambridge||CB4 0Wd = 296 The Science Park|||||CB4 0WD |
Fuzzy |
Address1 distilled exact, postal code starts with |
296 Cambridge Science Park|Flat 1||Cambridge||CB4 0WD = 296 Cambridge Science Park|Flat 6||Cambridge||CB4 0WD |
Fuzzy |
Address all words |
296 Science Park|Milton Road||Cambridge||CB4 0WD = Science Park|Milton Road||||CB4 0WD |
Fuzzy |
Address all words typos |
296 Science Park|Milton Road||Cambridge||CB4 0WD = Sciense Park|Milton Road||||CB4 0WD |
Fuzzy |
Address similar, postal code |
296 Science Pk|Milton Rd||Cambridge||CB4 0WD = Sceince Park|Milton Road||Cmbridge||CB4 0WD |
Fuzzy |
Address similar; first address1 word |
297 Cambridge Science Park||Milton Road|||CB30WS = 296 Cambridge Science Park|Milton Road||||CB4 0WD |
Fuzzy |
Postal code |
296 Science Park|||||CB4 0WD = |Milton Road||||CB4 0WD |
Fuzzy |
Postal code starts with |
296 Science Park|||||CB4 0WD = |||||CB4 |
Fuzzy |
City exact |
352 Mill Road|||Cambridge||CB1 3NN = 296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD |
Fuzzy |
Address no data |
||||| = 296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD |
No data |
Address conflict |
19 Teme Ave|||Malvern|Worcs|WR14 2XA = 296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD |
Conflict |
Account name
Matching on account name allows for matches including
-
Exact match
-
Typographic differences
-
All words in common
Table 3-32 Account name
Account name rule | Example | Type |
---|---|---|
Account name exact |
Widgets and Gadgets Ltd = Widgets and Gadgets Ltd |
Exact |
Account name typos |
Widgets and Gadgets Ltd = Widgets and Gagets Ltd |
Fuzzy |
Account name all words |
Federal Mogul Camshafts Castings Ltd = Federal Mogul Camshafts Ltd |
Fuzzy |
Account name all words out of order |
Federal Mogul Camshafts Castings Ltd = Federal Mogul Castings Camshafts Ltd |
Fuzzy |
Account name all words typos |
Federal Mogul Camshafts Castings Ltd = Federal Mogul Camshfts Ltd |
Fuzzy |
Account name all words output of order typos |
Federal Mogul Camshafts Castings Ltd = Federal Mogul Castings Camshfts Ltd |
Fuzzy |
Account name no data |
Oracle Ltd = |
No data |
Account name conflict |
Federal Mogul Camshafts Castings Ltd = Wigets and Gadgets Ltd |
Conflict |
Phone numbers
Table 3-33 Phone numbers
Phone matching rule | Example | Type |
---|---|---|
Phone exact |
01223456678 = 01223456678 |
Exact |
Phone last N |
+44223456678 = 01223456678 |
Fuzzy |
Phone no data |
01223456678 = |
No data |
Phone conflict |
01223456678=01684345678 |
Conflict |
Email matches allow for matches including:
-
Exact match
-
User name only exact
-
Typographic errors
Table 3-34 Email
Email match rule | Example | Type |
---|---|---|
Email exact |
someonesname@company.com = someonesname@company.com |
Exact |
Email user exact |
someonesname@company.com = someonesname@adomain.com |
Fuzzy |
Email typos |
someonesname@companion.com = someonesname@company.com |
Fuzzy |
Email no data |
someonesname@company.com = |
No data |
Email conflict |
someonesname@company.com = aperson@adomain.com |
Conflict |
Date of birth
Date of birth matches allow for matches including:
-
Exact match
-
Transposition of day/month match
Date of birth match rules also include a conflict rule where very different dates are penalized more severely
Table 3-35 Date of birth
Date of birth match rule | Example | Type |
---|---|---|
Date exact |
11/01/1976 = 11/01/1976 |
Exact |
Date similar |
01/11/1976 = 11/01/1976 |
Fuzzy |
Date no data |
11/01/1976 = |
No data |
Date too different |
11/12/2001 = 11/01/1976 |
Conflict |
Date conflict |
11/01/1976 = 20/01/1976 |
Conflict |
National Id number
Table 3-36 National Id number
National Id number rule | Example | Type |
---|---|---|
National Id number exact |
ABC112345 = ABC112345 |
Exact |
National Id number typos |
ABC12345 = ABC112345 |
Fuzzy |
National Id number no data |
ABC12345 = |
No data |
National Id number conflict |
ABD2535 = BCD2145 |
Conflict |
Tax number
Table 3-37 Tax number
Tax number rule | Example | Type |
---|---|---|
Tax number exact |
ABC112345 = ABC112345 |
Exact |
Tax number typos |
ABC12345 = ABC112345 |
Fuzzy |
Tax number no data |
ABC12345 = |
No data |
Tax number conflict |
ABD2535 = BCD2145 |
Conflict |
The individual matching service outputs fields which give information on the matching of any of the logical identifiers described above, as well as an overall score and a overall rule name. This will allow the consuming application to have more granular information about how the records matched, to use as they wish.
Here is an example. The records in Table 3-38 were compared. Results are given in Table 3-39.
Table 3-38 Comparing Records
Record 1 | — | Record 2 | — |
---|---|---|---|
Firstname |
John |
Firstname |
J |
Lastname |
Smith |
Lastname |
Smith |
Phonenumber |
01223456789 |
Phonenumber |
+44223456789 |
address1 |
35 Mill Road |
address1 |
35 Mill Road |
city |
Cambridge |
city |
Cambridge |
postalcode |
CB1 2JJ |
postalcode |
CB1 2JJ |
Table 3-39 Results of Comparison
Value | Result |
---|---|
matchscore |
95 |
rulename |
N040 Given name abbreviated, A010 Address exact, C070 Account name no data, D030 DOB no data, P020 Phone Last N, E040 Email no data, I030 National ID number no data, T030 Tax number no data |
ruleattributes |
NAME,ADDRESS,PHONE |
comparisonresults |
Name Fuzzy, Address Exact, Phone Fuzzy |
namescore |
95 |
nameresult |
N040 Given name abbreviated |
namecategory |
Fuzzy |
addressscore |
100 |
addressresult |
A010 Address exact |
phoneresult |
P020 Phone Last N |
phonescore |
90 |
phonecategory |
Fuzzyfamilyname |
*Results that are no data are omitted for brevity
Note:
If a field is known never to be populated in the data, then it is possible to "turn off" the compound comparison relating to the logical identifier, so that it does not appear in the rule.
The comparisonresults output field gives a comma separated list of any logical identifiers that have contributed to the match and the category of the match (i.e. returned a category of Exact or Fuzzy).
The ruleattributes field returns a comma separated list of the logical idenfiers that contributed to the match.
Secondary Identifier Match Rule | Description |
---|---|
DOB; e-mail |
Date of birth and e-mail match exactly. |
Address; e-mail |
Address and e-mail match exactly. |
E-mail; phone number |
E-mail and any phone number match exactly. |
Company; address |
All tokens in the shorter company name match in the longer company name, and the address matches exactly. |
Tax number |
Tax number matches exactly. |
National ID number |
National ID number matches exactly. |
|
E-mail matches exactly. |
Address |
Address matches exactly. |
Phone |
Any phone number matches exactly. |
Premise; subpremise; postal code starts with |
Address matches by extracted premise, subpremise and postal code Note: With matching services, leading zeroes are stripped only on numeric |
Premise; no subpremise; postal code starts with |
Address matches by extracted premise and postal code, and there is no data in either |
DOB |
Date of birth matches exactly. |
Phone last N digits |
Any phone number matches using the last N digits (tby default, the last 6 digits.) |
Company; postal code |
All tokens in the shorter company name match in the longer company name, and the postal code matches exactly. |
Address all words |
All words in the shorter address match in the longer address. |
DOB similar |
Dates of birth are a close match (a day/month transposition match using the default comparison settings). |
Tax number typos |
Tax number matches with a Character Edit Distance of 1 or 2. |
National ID number typos |
National ID number matches with a Character Edit Distance of 1 or 2. |
E-mail typos |
E-mail matches with a Character Edit Distance of 1 or 2. |
Address all words typos |
All words in the shorter address match in the longer address with a Character Error Tolerance of 20%. |
Address similar; postal code |
Address matches with a Character Match Percentage of 65 or more, and the postal code matches exactly. |
Address similar; first address one word |
Address matches with a Character Match Percentage of 65 or more, and there is at least one token match in the first line of the address. |
Company |
All tokens in the shorter company name match in the longer company name. |
In addition to the logical identifiers described above, it is possible to configure individual matching to use Custom fields for matching. Custom fields are not enabled by default for either matching or clustering, for further information, see Using Matching with Customer-Added Attributes
It is also possible to perform matching or elimination of Individual records using custom unique identifiers, see Using ID Matching.
Using Entity Matching
As with individuals, the matching design for entities in CDS is based on combining matches between several logical identifiers, using compound comparisons. These compound comparisons are:
-
Entity name
-
Address
-
Phone number
-
Website address
-
Tax number
-
VAT number
It is also possible to enable matching on the custom fields (however matching on these is not enabled by default)
EDQ-CDS uses preconfigured rules on the compound comparisons relating to the logical identifiers to ascertain how well two records match (or don't match) for that particular logical identifier.
In order to determine whether two records as a whole match, EDQ-CDS uses the results for the matching on the logical identifiers and combines them to produce an overall score that gives a measure of how well the records match. Note that a conflict will negatively affect a score, as well as a match increasing it. For example, two records with an exact match on name and address, but a conflicting web address will score lower than two records with an exact match on name and address, but no web address.
Each logical identifier has a default weighting, defining how likely two records with matches on this particular identifier are to be the same individual.
Note:
It is significantly harder to match entities (as opposed to individuals) between different writing systems, as the process of transliteration — and even transcription — is much less likely to be successful. Very often, the only way to recognize that a company is the same when written in two different languages is to hold huge dictionaries of all possible company names and their appropriate translations (rather than transliterations or transcriptions). In most cases, such data is simply not available though if it is available it can be plugged into EDQ-CDS in order to improve results.
Entity Name Matching
The rules for matching entity names include the use of pre-matching transformations and various matching comparisons in order to handle the following types of variance between different representations of what may be the same entity name:
-
Entity names written in different writing systems.
-
Entity names with or without suffixes, for example, 'Oracle LTD' and 'Oracle'.
-
Entity names containing abbreviated terms or suffixes, for example, 'Oracle Limited' and 'Oracle LTD'.
-
Character order and spelling differences/errors in entity names, for example, 'Oracle' and 'Oralce'.
-
Entity names with different levels of name completeness, for example, 'ABC Technology Consultants LTD' and 'ABC Technology LTD'.
-
Entity name tokens appearing in a different order, for example, 'Cambridge Science Park LTD' and 'Science Park Cambridge'.
-
Entity Names where part or all of the name is reduced to an acronym, for example, 'Oracle Catering' and 'O.C.'.
Note:
In the following table, where a name matching rule uses the 'full name', this means it applies to the entity full name identifier, a concatenation of the entity name and sub-name attributes. The pipe (|) character is used to separate the entity name and sub-name were the sub-name attribute is required to provide an example match.
Entity Name Matching Rule | Example Entity Name Match | Type |
---|---|---|
Script full name exact |
|
|
Full name exact |
TCHIBO GMBH = TCHIBO GMBH |
|
Standardized full name exact |
ORACLE UK LTD | READING = ORACLE UK LIMITED | READING |
Fuzzy |
Script full name without suffixes exact |
|
Fuzzy |
Full name without suffixes exact |
ORACLE = ORACLE CORPORATION |
Fuzzy |
Full name without suffixes similar and sounds like |
ORACLE CAMBRIDGE SCIENCE PARK = ORACLE CAMBRIDGE PARK SCIENCE |
Fuzzy |
Script full name out of order |
|
Fuzzy |
Script full name without suffixes all words out of order |
|
Fuzzy |
Full name without suffixes all words out of order |
CAMBRIDGE SCIENCE PARK LTD = SCIENCE PARK CAMBRIDGE |
Fuzzy |
Script full name has additional names |
|
Fuzzy |
Script entity name without suffixes exact |
|
Fuzzy |
Entity name without suffixes exact |
ORACLE CORPORATION | CAMBRIDGE = ORACLE | READING |
Fuzzy |
Full name all words shorter with typos |
Oracle Inc | Cambridge =Oracl | Cambridge |
Fuzzy |
Script entity name without suffixes starts with |
|
Fuzzy |
Entity name without suffixes starts with |
ABC TECHNOLOGY CONSULTANTS LTD = ABC TECHNOLOGY LTD |
Fuzzy |
Script full name without suffixes all words shorter with typos |
|
Fuzzy |
Full name without suffixes all words shorter with typos |
Federal Mogull | Camshafts Inc = Federal Mogul Camshafts Castings Ltd |
Fuzzy |
Script full name typos |
|
Fuzzy |
Full name typos |
ABD SERVICES LTD = ABC SERVICES LTD |
Fuzzy |
Script full name without suffixes typos |
|
Fuzzy |
Full name without suffixes typos |
ABD ENGINEERING LTD = ABC ENGINEERING |
Fuzzy |
Script entity name without suffixes starts with |
|
Fuzzy |
Entity name without suffixes starts with |
ABC LIMITED | CAMBRIDGE = ABC PHARMACEUTICALS LIMITED | READING |
Fuzzy |
Standardized full name acronym exact |
CSC= Computer Science Corporation |
Fuzzy |
Entity name distilled longest common substring 12+ |
Colebrook & Burgess (North Shields) Ltd. = Colebrook & Burgess (Teesside) Ltd. |
Fuzzy |
Full name without suffixes acronym exact |
CSC = Computer Science Collaborations Ltd |
Fuzzy |
Full name without suffixes acronym contains |
Oracle CK = Oracle Collaborative Koopers |
Fuzzy |
Entity name without suffixes loose typos |
Oracle Collaborative Coopers = Orracl Colabarativ Kupers |
Fuzzy |
Entity name without suffixes first token |
DANVERS BANCORP INC = DANVERS MUNICIPAL FEDERAL CREDIT UNION |
Fuzzy |
Entity name distilled first 3 exact, longest common substring 6+ |
Lincoln Co-Operative Chemists Ltd. = Lincolnshire Co-Operative Ltd. |
Fuzzy |
Entity name distilled one or more tokens exact |
Burgess Video Ltd. = Sue Burgess Ltd. |
Fuzzy |
Entity name no data |
Oracle Corporation = |
No data |
Entity name conflict |
Oracle Corporation = Sue Burgess Ltd. |
Conflict |
Matching on other logical identifiers for Entities
Addresses
The rules for matching addresses within entity name matching include the use of pre-matching transformations and various matching comparisons in order to handle the following types of variance between different representations of what may be the same address:
-
Extracting the premise and subpremise
-
Standardizing commonly used words such as STREET, ROAD, etc
-
Stripping commonly used words such as STREET, ROAD, etc
-
Typographic differences
Note:
In this table the pipe character is used to indicate a separator between the inputs of address1, address2, address3, city, adminarea, postalcode, country. For example address1=296 Cambridge Science Park address2= Milton Road address3=<blank>, city=Cambridge adminarea = <blank> postalcode=CB4 0WD country= United Kingdom is represented as 296 Cambridge Science Park|Milton Road||Cambridge||CB4 0WD|United Kingdom
Secondary Identifier Match Rule | Description |
---|---|
Address |
Address matches exactly. |
Premise; subpremise; postal code starts with |
Address matches by extracted premise, subpremise and postal code. Note: With matching services, leading zeroes are stripped only on numeric |
Premise; no subpremise; postal code starts with |
Address matches by extracted premise and postal code, and there is no data in either |
Address all words |
All words in the shorter address match in the longer address. |
Address all words typos |
All words in the shorter address match in the longer address with a Character Error Tolerance of 20%. |
Website; phone number |
The website address and any phone number match exactly. |
Tax number |
The tax number matches exactly. |
VAT number |
The VAT number matches exactly. |
Address 1 typo; city; country |
The address is similar and both the city and country matches exactly. |
Address similar; postal code |
Address matches with a Character Match Percentage of 65 or more, and the postal code matches exactly. |
Phone |
Any phone number matches exactly. |
Phone last N digits |
Any phone number matches using the last N digits (by default, the last 6 digits.) |
Tax number typos |
The tax number matches with a Character Edit Distance of 1 or 2. |
VAT number typos |
The VAT number matches with a Character Edit Distance of 1 or 2. |
Postal code |
The postal code matches exactly. |
City; country |
The city and country match exactly. |
Website |
The website address matches exactly. |
Website stem |
The stem part of the website address matches exactly. |
City |
The full city name matches exactly. |
Address similar; first address one word |
Address matches with a Character Match Percentage of 65 or more, at least one word matches in the first address line. |
Country |
The country name matches exactly. |
No address |
The address matches when it is missing in one or both of the records. |
Address conflict |
The addresses do not match at all. By default, this rule is only active for the first few primary identifier groups involving an exact name match. For example, if the addresses are different you must be confident that the names are the same and understand that it is a very loose match. |
Table 3-40 Address matching
Address matching rule | Example | Type |
---|---|---|
Address exact |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB|United Kingdom = Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB|United Kingdom |
Exact |
Subpremise, premise, postal code starts with, address similar |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4|United Kingdom |
Fuzzy |
Premise, no subpremise, postal code starts with, address similar |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4|United Kingdom |
Fuzzy |
Subpremise, premise, postal code starts with |
Flat 1|352 Milton Road||Cambridge||CB4 0WD| = 352 Milton Road|Flat 1||||CB4 0WD| |
Fuzzy |
Premise, no subpremise, postal code starts with |
296 Milton Road|Science Park||Cambridge||CB4 0WD| = 296 Milton Road|||||CB4 0WD| |
Fuzzy |
Address all words |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = |Milton Road||Cambridge|Cambridgeshire|CB4 1AB|United Kingdom |
Fuzzy |
Address all words typos |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = |Millton Road||Cambridge|Cambridgeshire|CB4 1AB|United Kingdom |
Fuzzy |
Address 1 typos, city, country exact or no data |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = Science|Mil||Cambridge|Cambridgeshire|CB4 1AB|United Kingdom |
Fuzzy |
Address similar, postal code |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = Science|Milton||Cam|Cambridgeshire|CB4 1AB|United Kingdom |
Fuzzy |
Postal code |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = |Arbury Road||Cambridge|Cambridgeshire|CB4 1AB|United Kingdom |
Fuzzy |
City and country |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = |Arbury Road||Cambridge|Cambridgeshire||United Kingdom |
Fuzzy |
City |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = |Arbury Road||Cambridge|Cambridgeshire|| |
Fuzzy |
Address similar, first address 1 word |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = Datanomic Science Park|Milton Road|Cambridge|Cambridgeshire||United Kingdom| |
Fuzzy |
Country |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = Datanomic Science Park|Arbury Road|Cambridge|Cambridgeshire|||United Kingdom |
Fuzzy |
Address no data |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = |||||| |
No data |
Address conflict |
Flat 1, 296 The Science Park|Milton Road||Cambridge|Cambridgeshire|CB4 1AB| = Datanomic|||Arbury||| |
Conflict |
Table 3-41 Website address
Website address matching rule | Example | Type |
---|---|---|
Website exact |
www.tcnltd.com = www.tcnltd.com |
Exact |
Website stem exact |
www.tcnltd.co.uk = www.tcnltd.com |
Fuzzy |
Website no data |
www.tcnltd.com = |
No data |
Website conflict |
www.abc.com = www.tcnltd.com |
Conflict |
Phone Number
Phone number matches allow for matches including:
-
Exact match
-
Last N characters matching
Table 3-42 Phone Number
Phone matching rule | Example | Type |
---|---|---|
Phone exact |
01223456678 = 01223456678 |
Exact |
Phone last N |
+44223456678 = 01223456678 |
Fuzzy |
Phone no data |
01223456678 = |
No data |
Phone conflict |
01223456678=01684345678 |
Conflict |
Table 3-43 VAT number
VAT number rule | Example | Type |
---|---|---|
VAT number rule |
ABC112345 = ABC112345 |
Exact |
VAT number exact |
ABC12345 = ABC112345 |
Fuzzy |
VAT number no data |
ABC12345 = |
No data |
VAT number conflict |
ABD2535 = BCD2145 |
Conflict |
Table 3-44 Tax number
Tax number rule | Example | Type |
---|---|---|
Tax number exact |
ABC112345 = ABC112345 |
Exact |
Tax number typos |
ABC12345 = ABC112345 |
Fuzzy |
Tax number no data |
ABC12345 = |
No data |
Tax number conflict |
ABD2535 = BCD2145 |
Conflict |
The entity matching service outputs fields which give information on the matching of any of the logical identifiers described above, as well as an overall score and an overall rule name. This will allow the consuming application to have more granular information about how the records matched, to use as they wish. Here is an example:
Table 3-45 Comparing Records
Record 1 | — | Record 2 | — |
---|---|---|---|
Name |
Widgets and Gadgets Ltd |
Name |
Gadgets and Widgets Ltd |
Subname |
Cambridge |
Subname |
Cambridge |
Phone |
012234567890 |
Phone |
+4412234567890 |
Website |
www.widgetsandgadgets.com |
Website |
www.widgetsandgadgets.org |
Tax Number |
ABC 1234 12 |
Tax Number |
ABC 1234 12 |
Address1 |
29 Mill Road |
Address1 |
Flat 3 |
Address2 |
Flat 3 |
Address2 |
29 Mill Road |
City |
City |
Cambridge |
|
Postal Code |
Postal Code |
CB1 3GH |
Table 3-46 Results of Comparison
Value | Result |
---|---|
ruleattributes |
NAME,ADDRESS,PHONE,WEBSITE,TAXNUMBER |
matchscore |
97 |
rulename |
N090 Full name without suffixes all words out of order, A040 Subpremise, premise, postal code starts with, W020 Website stem exact, P020 Phone last N, T010 Tax number exact, V030 |
comparisonresults |
Name Fuzzy, Address Fuzzy, Website Fuzzy, Phone Fuzzy, Tax Number Exact |
nameresult |
N090 Full name without suffixes all words out of order |
namescore |
20 |
namecategory |
Fuzzy |
addressresult |
A040 Subpremise, premise, postal code starts with |
addressscore |
50 |
addresscategory |
Fuzzy |
phonenumberresult |
P020 Phone last N |
phonenumberscore |
70 |
phonenumbercategory |
Fuzzy |
websiteresult |
W020 Website stem exact |
websitescore |
70 |
websitecategory |
Fuzzy |
taxnumberresult |
T010 Tax number exact |
taxnumberscore |
100 |
taxnumbercategory |
Exact |
*Results that are no data are omitted for brevity
The comparisonresults output field gives a comma separated list of any logical identifiers that have contributed to the match and the category of the match (i.e. returned a category of Exact or Fuzzy).
The ruleattributes field returns a comma separated list of the logical idenfiers that contributed to the match.
Note:
If a field is known never to be populated in the data, then it is possible to "turn off" the compound comparison relating to the logical identifier, so that it does not appear in the rule.
It is also possible to perform matching or elimination of Entity records using custom unique key generation, see Using ID Matching.
Using ID Matching
The ID Matching rules in EDQ-CDS allow matching (or elimination) based solely on custom unique identifiers, without the need for a name match of some kind, irrespective of matching (or not) on other fields. They are performed before, and are completely separate from the rule which matches on the logical identifiers described in the previous sections.
Matching and elimination is provided for Entity and Individual Matching, but not Address Matching.
Note:
-
Unique ID (UID) matching is always performed before EID or IEID matching. Therefore, if two records are matched by unique identifiers, they cannot then be eliminated.
-
These identifiers are always compared in standardized form; for example, values that differ only in case or additional non-alphanumeric character are considered identical. for example, the following values are identical for the purposes of ID matching:
-
AB123456789
-
ab123-456-789
-
ab12345 6789
-
ab#123456789
-
Using Unique ID Matching
The UID Match rules are held in the [I005] UID
and[E005] UID
match group of the Individual and Entity Match processes respectively. For example, for the match groups for Individual matches are as follows:
-
[I005A] Match UID1
-
[I005B] Match UID2
-
[I005C] Match UID3
To use these rules, map the required data in the records to one or more of the uid attributes. The matching rules will always match two records sharing a common unique identifier, even if none of the other attributes match.
Note:
-
The
uid
attributes accept multiple values in the form of a pipe delimited list. A match will be returned between two records if any one of a multiple set of attribute values is matched. -
Matching between
uid
attributes is not possible, for example,uid1
values cannot be matched withuid2
oruid3
values.
Example
The Passport Number
field in a series of records is configured as the uid1
attribute. Therefore, the following records are returned as a match:
Record ID | First Name | Last Name | uid1 (Passport Number) | Match? |
---|---|---|---|---|
1 |
Fred |
Smith |
12345678 |
Yes |
2 |
John |
Doe |
12345678 |
Yes |
The following records with multiple values in the uid1
field are also matched:
Record ID | First Name | Last Name | uid1 (Passport Number) | Match? |
---|---|---|---|---|
1 |
Fred |
Smith |
12312312 | 67867867 |
Yes |
2 |
John |
Doe |
67867867 | 23423423 |
Yes |
The SSN
field for the same set of records is configured as the uid2
attribute. The uid1
and uid2
fields are not cross matched; even though the uid1
value of Record 1 matches the uid2
value of Record 2:
Record ID | First Name | Last Name | uid1 (Passport Number) | uid2 (SSN) | Match? |
---|---|---|---|---|---|
1 |
Fred |
Smith |
12312312 |
67867867 |
No |
2 |
John |
Doe |
67867867 |
12312312 |
No |
Using Elimination ID Matching
The Elimination ID (EID) Match rules are held in the [ELIM015] EID ELIMINATIONS
group of the Entity and Individual Match processes:
-
[ELIM015A] ELIMINATE EID1
-
[ELIM015B] ELIMINATE EID2
-
[ELIM015C] ELIMINATE EID3
To use these rules, map the required data in the records to one or more of the eid
attributes. The EID matching rules will always return a "No Match" result for two records that do not share a common value in an eid
attribute, even if all other attributes match. The exception to this is if the two records are matched using a uid
attribute, as UID matching is performed before EID matching.
Note:
-
eid
attributes accept multiple values in the form of a pipe delimited list. A "No Match" result will be returned between two records if none the values in an attribute are matched. -
Eliminating possible matches by comparing values between different
eid
attributes is not possible, for example,eid1
values cannot be compared witheid2
oreid3
values.
Example
The SSN
field in a series of records is configured as the eid1
attribute. Therefore, the following records are eliminated as a possible match:
Record ID | First Name | Last Name | eid1 (SSN) | Eliminate? |
---|---|---|---|---|
1 |
John |
Doe |
12345678 |
Yes |
2 |
John |
Doe |
87654321 |
Yes |
The following records with multiple values in the eid1
field are also eliminated as a possible match, as none of the values match:
Record ID | First Name | Last Name | eid1 (SSN) | Eliminate? |
---|---|---|---|---|
1 |
John |
Doe |
12312312 | 23423423 |
Yes |
2 |
John |
Doe |
45645645| 67867867 |
Yes |
The Passport
field for the same set of records is configured as the eid2
attribute. The eid1
and eid2
fields are not compared, and therefore a "No Match" result is returned and the records are eliminated as a possible match:
Record ID | First Name | Last Name | eid1 (SSN) | eid2 (Passport Number) | Eliminate? |
---|---|---|---|---|---|
1 |
John |
Doe |
12312312 |
67867867 |
Yes |
2 |
John |
Doe |
67867867 |
12312312 |
Yes |
Finally, there are two identical values in the eid1
fields of the following records, and therefore they are not eliminated as a possible match:
Record ID | First Name | Last Name | eid1 (SSN) | Eliminate? |
---|---|---|---|---|
1 |
John |
Doe |
12312312 | 23423423 |
No |
2 |
John |
Doe |
45645645| 12312312 |
No |
Using Inverted Elimination ID Matching
The Inverted Elimination ID (IEID) Match rules are held in the INVERTED EID ELIMINATIONS
group of the Entity and Individual Match processes:
Inverted ID matching provides similar functionality to Elimination Ids (EIDs) but produces a "No match" result when the identifier values are the same. Inverted ID matching allows you to eliminate matches where records share a common value.
To use these rules, map the required data in the records to one or more of the ieid
attributes. The IEID matching rules will always return a "No Match" result for records where the inverted EID (IEID) values are the same.
Using Matching with Customer-Added Attributes
Matching with customer-added string and date attributes improves how you can configure EDQ and reduces the need to customize the EDQ-CDS configuration for attributes not present on the standard interface.
The Individual Candidates and Entity Candidates interfaces each contain six custom string and three custom date attributes. The Matches interface contains custom result, category, and score attributes for each custom string and custom date.
Standardization
Custom strings can be specified as type identifier
or text
, which affects how they are standardized: identifier
custom strings are stripped of non-alphanumeric characters and converted to upper case, while text
custom strings are just normalized.
This behavior is specified in the run profile as follows:
phase.*.process.*.customstringNtype = text
and can be overridden in real-time on a per-message basis as follows:
<dn:request customstringNtype="identifier">
Custom dates are standardized the same way, as a conversion to the date
data type.
Matching
Custom attributes can optionally be used during matching (by default no matching is performed on custom attributes) irrespective of whether or not they have been used for keying (see Keys for Custom Attributes).
There are two ways custom attributes can be matched:
-
Exact only
-
Exact and fuzzy
There are two compound comparisons for each custom attribute:
-
customstringNexact
/customdateNexact
-
customstringNfuzzy
/customdateNfuzzy
Therefore the enablement and type of matching performed for each custom attribute, and the corresponding weighting, is specified in the run profile by using the relevant 'exact' or 'fuzzy' parameters for each of these compound comparisons, for example:
phase.*.process.Match\ -\ Individual.overallscore.customstring1exact.enabled = Y phase.*.process.Match\ -\ Individual.overallscore.customstring1exact.weighting = 1 phase.Individual\ Match.process.*.overallscore.customstring1fuzzy.enabled = N phase.Individual\ Match.process.*.overallscore.customstring1fuzzy.weighting = 1
That is, in order to match on any given custom attribute, either the corresponding 'exact' or 'fuzzy' compound comparison should be enabled, but not both.
These settings can also be overridden in real-time on a per-message basis as follows:
<dn:request overallscore.customstring1exact.enabled="Y" overallscore.customstring1exact.weighting="1" overallscore.customstring1fuzzy.enabled="N" overallscore.customstring1fuzzy.weighting="1" >
Using Address Matching
The rules for matching addresses include the use of pre-matching transformations and various matching comparisons in order to handle variance between different representations of what may be the same address, for example:
-
Addresses containing abbreviated terms or suffixes.
-
Character order and spelling differences/errors in addresses.
-
Addresses with different levels of completeness.
-
Addresses where extracted premise and sub-premise match, and other components of the address are in a different order or missing on one side.
The following table lists all of the rules provided:
Address Match Rule Code | Address Match Rule Description |
---|---|
[A010] |
Address exact, postal code exact |
[A020] |
Address exact, no postal code |
[A030] |
Address lines 1 and 2 exact, city exact, postal code exact |
[A040] |
Address lines 1 and 2 exact, city exact, postal code starts with |
[A050] |
Address all words, subpremise exact, premise exact, postal code exact |
[A060] |
Address all words, subpremise exact, premise exact, postal code no conflict |
[A070] |
Address 1 exact, address 2 no conflict, subpremise exact, premise exact postal code exact |
[A080] |
Address 1 exact, address 2 no conflict, subpremise exact, premise exact, postal code starts with |
[A090] |
Address 1 exact, address 2 no conflict, subpremise exact, premise exact, postal code no conflict |
[A100] |
Address all words typos, subpremise exact, premise exact, postal code exact |
[A110] |
Address all words typos, subpremise exact, premise exact, postal code no conflict |
[A120] |
Address 1 exact, address 2 no conflict, postal code exact |
[A130] |
Address 1 exact, address 2 no conflict, postal code starts with |
[A140] |
Address 1 exact, subpremise exact, premise exact, postal code exact |
[A150] |
Address 1 exact, subpremise exact, premise exact, postal code starts with |
[A160] |
Address 1 exact, subpremise no conflict, premise no conflict, postal code exact |
[A170] |
Address 1 exact, subpremise no conflict, premise no conflict, postal code starts with |
[A180] |
Address all words, subpremise no conflict, premise no conflict, postal code exact |
[A190] |
Address all words, subpremise no conflict, premise no conflict, postal code no conflict |
[A200] |
Address 1 all words, subpremise exact, premise exact, postal code exact |
[A210] |
Address 1 all words, subpremise exact, premise exact, postal code starts with |
[A220] |
Address 1 all words, subpremise no conflict, premise no conflict, postal code exact |
[A230] |
Address 1 all words, subpremise no conflict, premise no conflict, postal code starts with |
[A240] |
Address1 common string 7+, subpremise exact, premise exact, postal code exact |
[A250] |
Address all words, postal code exact |
[A260] |
Address similar, subpremise exact, premise exact, postal code exact |
[A270] |
Address 1 all words, address 2 no conflict, postal code exact |
[A280] |
Address 1 all words, address 2 no conflict, postal code starts with |
[A290] |
Address all words typos, postal code exact |
[A300] |
Address 1 exact, subpremise exact, premise exact, postal code no conflict |
[A310] |
Address 1 all words, subpremise exact, premise exact, postal code no conflict |
[A320] |
Address 1 exact, postal code exact |
[A330] |
Address 1 exact, postal code starts with |
[A340] |
Subpremise exact, premise exact; postal code exact |
[A350] |
Subpremise exact, premise exact, postal code starts with |
[A360] |
Address all words |
[A370] |
Address all words typos |
[A380] |
Address similar; postal code |
[A390] |
Address similar; first address one word |
The following table provides examples of matches by Match Rule Code only, with the key fields highlighted in bold text where required:
Address Match Rule Code | Address Component | Record | Matched Record |
---|---|---|---|
[A010] |
address1 |
901 GOLF CLUB RD |
901 GOLF CLUB RD |
[Null] |
city |
WESTWOOD |
WESTWOOD |
[Null] |
subadminarea |
PLUMAS |
PLUMAS |
[Null] |
adminarea |
CA |
CA |
[Null] |
postalcode |
96137 |
96137 |
[Null] |
country |
US |
US |
[A020] |
As for [A010], but the postalcode field in both records is blank. |
As for [A010], but the postalcode field in both records is blank. |
As for [A010], but the postalcode field in both records is blank. |
[A030] |
address1 |
1201 BEECH ST |
1201 BEECH ST |
[Null] |
address2 |
APT 104F |
APT 104F |
[Null] |
city |
PALO ALTO |
PALO ALTO |
[Null] |
subadminarea |
SANTA CLARA |
SAN MATEO |
[Null] |
adminarea |
CA |
CA |
[Null] |
postalcode |
94303 |
94303 |
[Null] |
country |
US |
US |
[A040] |
As [A030], except the v field in one address starts with the same characters as the other, but is not identical. |
As [A030], except the v field in one address starts with the same characters as the other, but is not identical. |
As [A030], except the v field in one address starts with the same characters as the other, but is not identical. |
[A050] |
address1 |
5 Hogskoleringen |
Hogskoleringen 5 |
[Null] |
city |
Trondheim |
Trondheim |
[Null] |
adminarea |
[Null] |
SØR-TRØNDELAG |
[Null] |
postalcode |
7491 |
7491 |
[Null] |
country |
Norway |
Norway |
[A060] |
As [A050], except one or both of the postalcode fields are blank. |
As [A050], except one or both of the postalcode fields are blank. |
As [A050], except one or both of the postalcode fields are blank. |
[A070] |
address1 |
Heinrichboeckingstr 10-14 |
Heinrichboeckingstr 10-14 |
[Null] |
address2 |
Service Zentrum Merzig |
|
[Null] |
city |
Saarbrücken |
Saarbrücken |
[Null] |
adminarea |
[Null] |
SAARLAND |
[Null] |
postalcode |
66121 |
66121 |
[Null] |
country |
Germany |
Germany |
[A080] |
Same as [A070], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A070], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A070], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A090] |
Same as [A070], except one or both of the |
Same as [A070], except one or both of the |
Same as [A070], except one or both of the |
[A100] |
address1 |
HOGSKOLERINGE 5 |
HOGSKOLERINGEN 5 |
[Null] |
city |
Trondheim |
Trondheim |
[Null] |
postalcode |
9491 |
9491 |
[Null] |
country |
Norway |
Norway |
[A110] |
Same as [A100], except one or both of the postalcode fields are blank. |
Same as [A100], except one or both of the postalcode fields are blank. |
Same as [A100], except one or both of the postalcode fields are blank. |
[A120] |
address1 |
Marshfield Bank |
Marshfield Bank |
[Null] |
address2 |
WOOLSTANWOOD |
[Null] |
[Null] |
city |
Crewe |
Crewe |
[Null] |
postalcode |
CW28UY |
CW28UY |
[Null] |
country |
UK |
UK |
[A130] |
Same as [A120], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A120], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A120], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A140] |
address1 |
Apt Y302 |
APT Y302 |
[Null] |
address2 |
1605 Sherringtowne Ave |
1605 Sherington Ave |
[Null] |
city |
NEWPORT BEACH |
NEWPORT BEACH |
[Null] |
adminarea |
Orange |
Orange |
[Null] |
postalcode |
92663-9087 |
92663-9087 |
[Null] |
country |
US |
US |
[A150] |
Same as [A140], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A140], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A140], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A160] |
address1 |
1728 Corporate Xing |
1728 Corporate Xing |
[Null] |
address2 |
Suite1 |
[Null] |
[Null] |
city |
O Fallon |
O Fallon |
[Null] |
adminarea |
ILLINOIS |
IL |
[Null] |
postalcode |
62269-3734 |
62269-3734 |
[Null] |
city |
US |
US |
[A170] |
Same as [A160], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A160], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A160], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A180] |
address1 |
Block 16 |
16 Dunsinane Ave |
[Null] |
address2 |
Dunsinane Avenue |
[Null] |
[Null] |
address3 |
Dunsinane Industrial Estate |
[Null] |
[Null] |
city |
Dunsinane |
Dunsinane |
[Null] |
postalcode |
DD23QT |
DD23QT |
[Null] |
country |
UK |
UK |
[A190] |
As [A180], except one or both of the postalcode fields are blank. |
As [A180], except one or both of the postalcode fields are blank. |
As [A180], except one or both of the postalcode fields are blank. |
[A200] |
address1 |
26701 QUAIL CRK |
26701 QUAIL CRK APT 107 |
[Null] |
address2 |
APT 107 |
[Null] |
[Null] |
city |
ALISO VIEJO |
LAGUNA HILLS |
[Null] |
postalcode |
92656-1089 |
92656-1089 |
[Null] |
country |
US |
US |
[A210] |
Same as [A200], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A200], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A200], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A220] |
address1 |
Folkes Road |
Unit 12 Folkes Road |
[Null] |
address2 |
Hayes Trading Estate |
Lye |
[Null] |
address3 |
Lye |
[Null] |
[Null] |
city |
Stourbridge |
Stourbridge |
[Null] |
postalcode |
DY98RN |
DY98RN |
[Null] |
country |
UK |
UK |
[A230] |
Same as [A220], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A220], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A220], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A240] |
address1 |
101/61 NAWANAKORN INDUSTRY |
101/61 NAVANAKORN INDUSTRY |
[Null] |
address2 |
SELFLEMENT PHAHONYOTHIN |
PAHOLYOTHIN KLONGNUENG |
[Null] |
city |
KLONGLAUNG |
KHLONG LUANG |
[Null] |
postalcode |
12120 |
12120 |
[Null] |
country |
Thailand |
Thailand |
[A250] |
address1 |
Blyth House |
Blyth House |
[Null] |
address2 |
130 Hordern Road |
Hordern Road |
[Null] |
city |
Wolverhampton |
Wolverhampton |
[Null] |
postalcode |
WV60HS |
WV60HS |
[Null] |
country |
UK |
UK |
[A260] |
address1 |
21001 State Route 739 |
21001 Sr Rt 739 |
[Null] |
address2 |
7 |
[Null] |
[Null] |
city |
Raymond |
Raymond |
[Null] |
postalcode |
43067 |
43067 |
[Null] |
country |
United States |
United States |
[A270] |
address1 |
Lancaster House Aviation Way |
Aviation Way |
[Null] |
address2 |
[Null] |
Southend Airport |
[Null] |
city |
SOUTHEND ON SEA |
SOUTHEND ON SEA |
[Null] |
postalcode |
SS26UN |
SS26UN |
[Null] |
country |
UK |
UK |
[A280] |
Same as [A270], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A270], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
Same as [A270], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
[A290] |
address1 |
Blythe House |
Blyth House |
[Null] |
address2 |
130 Hordern Road |
Hordern Road |
[Null] |
city |
Wolverhampton |
Wolverhampton |
[Null] |
postalcode |
WV60HS |
WV60HS |
[Null] |
country |
UK |
UK |
[A300] |
Same as [A140], except one or both of the postalcode fields are blank. |
Same as [A140], except one or both of the postalcode fields are blank. |
Same as [A140], except one or both of the postalcode fields are blank. |
[A310] |
Same as [A200], except one of both of the postalcode fields are blank. |
Same as [A200], except one of both of the postalcode fields are blank. |
Same as [A200], except one of both of the postalcode fields are blank. |
[A320] |
address1 |
Network House |
Network House |
[Null] |
address2 |
1 Ariel Way |
Wood Lane |
[Null] |
city |
London |
London |
[Null] |
postalcode |
W127SL |
W127SL |
[Null] |
country |
UK |
UK |
[A330] |
Same as [A320], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
||
[A340] |
address1 |
College Business Park |
College Business Park |
[Null] |
address2 |
Park |
Coldhams Lane |
[Null] |
city |
Cambridge |
[Null] |
[Null] |
postalcode |
CB13HD |
CB13HD |
[Null] |
country |
United Kingdom |
United Kingdom |
[A350] |
Same as [A340], except the postalcode field in one address starts with the same characters as the postalcode field in the other, but is not identical. |
||
[A360] |
address1 |
938 Miller St |
Medical Ctr Blvd |
[Null] |
address2 |
Medical Center Boulevard |
[Null] |
[Null] |
city |
Winston Salem |
Winston- Salem |
[Null] |
postalcode |
27157 |
27157 |
[Null] |
country |
United States |
United States |
[A370] |
address1 |
Humberstone Avenue |
24 Humberston Avenue |
[Null] |
address2 |
Humberstone |
Humberston |
[Null] |
city |
GRIMSBY |
GRIMSBY |
[Null] |
postalcode |
DN364SX |
DN364SP |
[Null] |
country |
UK |
UK |
[A380] |
address1 |
5 Sidings Court |
Greyfriars House |
[Null] |
address2 |
White Rose Way |
Sidings Court |
[Null] |
city |
DONCASTER |
DONCASTER |
[Null] |
postalcode |
DN45NU |
DN45NU |
[Null] |
country |
UK |
UK |
[A390] |
address1 |
120 Howard St |
120 Howard St |
[Null] |
address2 |
[Null] |
STE 200 |
[Null] |
city |
San Fransisco |
San Fransisco |
[Null] |
adminarea |
CA |
CA |
[Null] |
postalcode |
94105-1622 |
94105-1615 |
[Null] |
country |
United States |
United States |
Note:
Unlike Individual and Entity matching, Address Matching does not make use of the compound comparison match functionality, since it does not lend itself to splitting the matching between separate logical identifiers for matching in the same way.