1.3.10.38 Soundex
The Soundex processor generates a soundex code for each value in a specified attribute. Soundex is an abstract key which represents similar sounding names as the same code. Soundex is specifically applicable to family / surnames (although is sometimes used – with care - in other domains).
Soundex codes are used where spelling or transcription differences occur in names that sound the same. Having created a soundex code, you would often use the soundex instead of the raw data value in a duplicate check.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any String or String Array attributes from which you want to create a soundex code. Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output. |
Options |
None. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
None. |
The Soundex transformer presents no summary statistics on its processing.
In the Data view, the input array attribute is shown with the new array size attribute to its right.
Output Filters
None. All records input are output.
Example
This example uses the Soundex transformation on a Surname attribute. The Surname attribute was created from the NAME attribute in the Customers table, by splitting the attribute using a Make Array from String processor, using a space separator, and outputting the Surname by selecting the second element in the array using Select Array Element processor:
Surname (asc) | Surname.Soundex |
---|---|
ADAMSKI |
A352 |
AHMED |
A530 |
AITKEN |
A325 |
ALLAN |
A450 |
ALLEN |
A450 |
Note that where values should possibly be the same and may be the subject of typos, such as ALLAN/ALLEN, the same soundex code is generated.