2 Customizing Customer Data Services Pack
This chapter includes the following sections:
EDQ-CDS has been designed to perform well with minimal customization. Ready-to-use, the application can perform key generation and matching of individual, entity and address data in connected supported applications with little or no configuration changes required.
Using Stand-Alone Batch Matching
EDQ-CDS is designed to process customer data from any external system or stand-alone source. By default, pre-configured batch jobs are provided that work with a set of staging tables. Reconfiguring the product to process data from other sources, such as a text file, is straightforward.
In order to reuse the batch data matching services provided, it is necessary to create new input and output mappings for the s. The following sections use examples that demonstrate how to do this and how to run matching using a modified copy of an existing job configuration.
Using Stand-Alone Individual Batch Matching
You can create a new stand-alone individual batch matching job using the following example steps:
-
Ensure that no jobs are currently running.
-
In the EDQ-CDS project, create a new server-side data store named File In: Individuals that points to the structured text file containing the customer data to be processed. It is important that this is created as a server-side data store in order to be used within a job definition.
-
Create a new snapshot named Individuals using the File In: Individuals data store as a source.
-
Create the Input mappings as follows:
-
Right-click the Individual Candidates and select Mappings... to open the Mappings dialog.
-
Click Add to open the New Mappings dialog.
-
Select the Individuals snapshot as the source and click Next. The Staged data default type is used.
-
Map the Customer Data Attributes on the left of the dialog to the Attributes on the right as follows:
Note:
In some instances, it may be necessary to construct a process that reads from the snapshot and reshapes the data to match the , see Converting Data to the Interface Format.
-
Click Next.
-
Name the mapping Individual Candidates and click Finish to save.
-
Click OK.
-
-
Create a new Staged Data named Individual Matches with columns corresponding to the columns in the Matches .
-
Create the Output mappings as follows:
-
Right-click the Matches and select Mappings... to open the Mappings dialog.
-
Click Add to open the New Mappings dialog.
-
Select the Individual Matches staged data as the target and click Next.
-
Map the Matches attributes on the left to the Individual Matches attributes on the right as required.
-
Click Next.
-
Name the mapping Individual Matches and give it a description, then click Finish.
-
Click OK to close the dialog.
-
-
Create a new server-side delimited text data store called File Out: Individual Matches to use as a target for the match results. Alternatively, the data can be written to a database if required.
-
Create a new export called Matches to File Out: Individual Matches that uses the Matches as the source to export from, and the File Out: Individual Matches as the target for the export.
-
Create and configure a job to run matching as follows:
-
Create a copy of the Batch Individual Match job, rename it Batch Individual Match using Text File, and then open it.
-
Open the Individual Match job phase, change the source of the input data by double-clicking on the Individual Candidates and selecting the Individual Candidates mapping.
-
Click OK to apply the changes. The job configuration is modified accordingly and the old snapshot and staged data items are disconnected.
-
Delete the Individual Candidates snapshot task.
-
Drag the Individuals snapshot from the Snapshot in the Tool Palette into the open job phase and make sure it is connected to the Individual Candidates mapping.
-
Drag the Matches to File Out: Individual Matches export task from the Export in the Tool Palette into the open job phase and connect it to Match Results - Output.
-
Delete the Batch Matches export task.
-
-
Close the job and save the configuration changes.
Converting Data to the Interface Format
It may not always be possible to directly map the input source to the candidates interface if:
-
fields are of the wrong data type (for example, "Date of Birth" in a date field); or
-
fields need transforming to a compatible format/structure (for example, Individual names in a full name field).
If this is the case, then the input data should be run through a custom EDQ process to convert the data as appropriate as in the following example steps:
-
Ensure that no jobs are currently running.
-
Create a data store and snapshot for the input data as in steps and Step 2 and Step 3 from Using Stand-Alone Batch Matching.
-
In the EDQ-CDS project, right-click the Processes node in the Project Browser and select New Process... to open the New Process wizard.
-
Select the snapshot created in step Step 2 as the data source.
-
Click Next.
-
On the last page of the wizard, rename the process Transform Individuals, then click Finish button to create the process.
-
On the Process canvas, add the necessary processors to transform the data to the interface format. For example, use a Convert Date to String processor to convert a date of birth in date format to the required format for the Candidates interface (for example, either yyyyMMdd, MM/dd/yyyy, yyyy-MM-dd or dd-MMM-yy).
-
Add a Writer processor to the process canvas and connect it to the process data stream:
-
In the Writer Configuration dialog, select the Individual Candidates and map the attributes accordingly.
-
Create and configure a new job as follows:
-
Make a copy of Batch Individual Match job, renaming it Batch Transformed Individual Match.
-
Open the new job.
-
Double-click on the Individual Match job phase.
-
Use steps Step 9d- Step 10 of Using Stand-Alone Batch Macthing from step 9d onwards, adding in the new Transform Individuals process between the Individuals snapshot and the process Input - Prepare - Individual. The resulting job should look like the following:
-
Using Cleaning Services
The cleaning processes provided with EDQ-CDS are provided as templates only, with the exception of the Address Cleaning process which is fully functional and uses EDQ-AV for address verification and standardization. The Individual and Entity cleaning processes are intended to be customized to meet the data standardization requirements of the implementation.
Customizing the Cleaning Services
The examples in the following sections demonstrate modifying the cleaning services provided with EDQ-CDS.
Adjusting Matching
This section explains how you can change the EDQ matching settings.
Changing the Key Method To Use During Matching
Keys are used as the first stage of matching to pre-select similar records. This will happen inside EDQ for batch matching, or in the calling application during candidate selection for real-time matching.
By default, the key methods that are used during matching depends on the value of the keyprofile
setting. The key profile specifies the enablement of key methods, allowing EDQ-CDS to offer a wider menu of key method algorithms.
The methods for controlling which Match key methods are used differs for Batch and Real-Time processing. The following sections contain examples to show you how to modify the key methods used.
Changing Match Rule Enablement
Match rule enablement is externalized in this release. You can override this behavior by adding the name...address conflict
properties to your edq-cds.properties
file then editing the values as in the following example:
# Disable all entity "name...address conflict" type rules. phase.*.process.Match\ -\ Entity.[E010V]\ Script\ full\ name\ exact\;\ address\ conflict.entity_match_rules_enabled = false phase.*.process.Match\ -\ Entity.[E020V]\ Full\ name\ exact\;\ address\ conflict.entity_match_rules_enabled = false phase.*.process.Match\ -\ Entity.[E030V]\ Standardized\ full\ name\ exact\;\ address\ conflict.entity_match_rules_enabled = false phase.*.process.Match\ -\ Entity.[E040V]\ Script\ full\ name\ without\ suffixes\ exact\;\ address\ conflict.entity_match_rules_enabled = false phase.*.process.Match\ -\ Entity.[E050V]\ Full\ name\ without\ suffixes\ exact\;\ address\ conflict.entity_match_rules_enabled = false
Capitalization must be respected and characters must be escaped as required. The asterisk (*
) character denotes a wildcard, which specifies that the above rule applies to all phases and all processes.
Turning off Unused Match Functionality
The value of the matchthreshold
setting is used to control the strength of matches that are returned from the Matching services by filtering out results that fall below the specified threshold. Match rules with a priority score below this value are effectively redundant.
Also, the match processes output a number of additional attributes which are not used in the default configuration and can be removed without loss of functionality. These attributes may be required for use in customizations of EDQ-CDS. For more information, see Turning off Unused Match Functionality.
Disabling Rules with Lower Scores
The matchthreshold
setting has been configured to have a value of 70
, so all Match rules with a lower priority score will be disabled.
The following example steps show you how to disable Match rules for any Match process (for example, Match - Individual, Match - Entity or Match - Address):
-
Ensure that no jobs are currently running.
-
In the EDQ-CDS project, open the Match process.
-
Double-click the Match processor to open the Match Configuration tab.
-
Double click the Match sub-processor icon to open the Match Configuration dialog.
-
Select the Match Rules tab and select the last Match group.
-
Clear the check box beside each Match rule with a Match Priority score lower than
70
to disable it. -
Repeat for each Match group until all rules with a score less than
70
have been disabled. -
Click OK to close the dialog.
-
Close the process and save the configuration changes.
Reviewing Matches in EDQ
The EDQ-CDS Matching services return only those records that matched with a score equal to or greater than the matchthreshold
setting, and for those records it only returns the record ID, rule name and score. It is useful to be able to view the full record details during rule tuning in order to analyze matches. The Match Review application is a helpful tool in this process.
Enabling Match Review in Individual Batch Matching
You can enable match review for individual batch matching as in the following example steps.
-
Ensure that no jobs are currently running.
-
In the EDQ-CDS project, open the Match - Individual process.
-
Double-click on the Match Individuals processor to open the Match Configuration dialog.
-
Click Advanced Options.
-
From the Review System list, select Match Review, and then click OK. This makes the Assign Relationship Review option active.
-
Click Assign Relationship Review.
-
In the dialog displayed, select the appropriate user or user group in the Assigned To drop-down field.
-
Click OK to close the dialog.
-
Close the process and save the configuration changes.
-
Open the Batch Individual Match job.
-
Locate the Match phase, right-click on the Match Prepare task and select Configure. The Task Configuration dialog opens.
-
Select the Process tab, and check the Enable Sort/Filter in Match? option.
-
Click OK and close the job, saving changes when prompted.
-
Run the job from Director with the appropriate run profile and no run label to regenerate the data.
Note:
In order to generate Match Review data, you must run jobs without a run label.
Matches can be reviewed as follows:
Modifying Reference Data Used in Matching
This section explains how you can modify your data to improve matching and provides examples to aid you.
Stripping Words/Phrases from Name Fields
It is possible to customize the system to strip certain words and phrases from names that are deemed to be noise and/or add little information, and therefore may lead to potential missed matches.
Removing Noise from Individual Names
Name fields in customer data systems are often overfilled with additional (non-name) information, either because there are no other suitable fields available or due to errors made by Data Entry users. Common examples include "Fred SMITH (DO NOT CALL)" and "John DOE (DECEASED)". This extraneous information can be removed during name standardization when a "distilled" name is created for use in matching.
Use the following example steps to remove noise from individual names:
Note:
The Real-Time services will use the modified Reference Data sets the next time the full Real-time START ALL job (which re-snapshots the prepared Reference Data from files) is run.
To remove words and phrases from individual names in non-Latin scripts use the reference data Strip List – Individual Script Strip List Reference Data . This Reference Data set is used as a replacement map and should have a blank value in the second column.
Removing Noise from Entity Names
Noise words and phrases or common business words (including suffixes) in Entity names that add little value in matching can be removed during name standardization when a "distilled" name is created. An example of such a noise word is "International", which is often found in organization name fields.
Due to the high frequency of occurrence of this term it is often omitted or shortened when entering the name, which may lead to potential matches being missed. Therefore it may be more appropriate to remove the term and all known variants for the purposes of matching.
Use the following example steps to remove noise from entity names:
To remove words and phrases from entity names in non-Latin scripts use the Strip List – Entity Script Suffixes Reference Data.
Changing Name Standardization
EDQ-CDS uses a name standardization technique in order to match name variants. It is supplied with a large collection of common name variants for various language domains. It is possible to customize these lists.
Note:
If a name standardization is changed or added, the subsequent results may be eliminated during Conflict Resolution. For further details, see Resolving Conflicts.
Resolving Conflicts
Conflict resolution is performed to resolve issues arising when name standardization rules try to standardize names to more than one Master name. For example, if there is a rule that maps "Jon" to a Master of "John" and another that maps "Jon" to "John-Boy", there is a conflict. This conflict is resolved by assessing the importance of each Master name in the given standardization data. The best candidate is then selected as the primary Master, and other standardization maps conflicting with it are removed and quarantined.
As part of conflict resolution, each removed record is assigned one or more Reason Codes explaining why it is in conflict. These codes are displayed in the REASON column in the Server Console Results window:
The Reason Codes are as follows:
-
PIV: The Primary record of a cluster of records (for example, the best Master identified for a set of equivalences) is also present as a variant to other Masters. All the instances where this Primary name is a variant are removed.
-
PVOM: The records that are variants of the current Primary are also variants of other Masters. All the records for these variants pointing to other Masters are removed.
-
PVIM: The records that are variants of the current Primary are also Masters to other variants. All the records where this variant is a Master are removed.
-
PIVCUTOFF: Whereas the other removals take place after identification of Primary clusters, there comes a time where it is not efficient to continue to identify the Primaries, and the remaining records where the Master name also exists as a variant have all the variant versions removed in a final cull of records that violate integrity.
Expanding on the simple example given at the beginning of this section, let us assume that there are the following name standardization rules:
Master | Primary |
---|---|
J-MAN |
JON |
JOHN |
JONATHAN |
JOHNNY |
JONNY |
JON |
JOHN |
JON |
JONATHAN |
JON |
JOHN-BOY |
JONNY |
JONATHAN |
JONATHAN |
JONATHON |
JOHNNY |
JONATHAN |
These rules contain a number of inherent conflicts. This is illustrated in the following diagram in which JONATHAN is identified as the Primary:
The arrows indicate the following:
Arrow Type | Reason for Conflict |
---|---|
![]() |
N/A (No conflict exists) |
![]() |
PIV |
![]() |
PVIM |
![]() |
PVOM |
The conflict resolution rules will discard the mappings that cause conflicts, as follows:
Resulting in the following mappings being created:
Name | Primary |
---|---|
JOHN |
JONATHAN |
JON |
JONATHAN |
JONNY |
JONATHAN |
JOHNNY |
JONATHAN |