1.3.10.34 Replace
The Replace processor uses a Reference Data map to transform data - for example in order to standardize it. The first column of the map is used to match values against, and the second column is used to control the replacement.
The replacement performed may be a simple whole value replacement - for example to replace the value 'Oracle Ltd' with 'Oracle Limited', or it may be a replacement of a part of the input value - for example to replace 'ltd' with 'limited' if it is found at the end of a CompanyName
attribute, or to replace the String 'decsd' with 'deceased' wherever it is found. The way the Reference Data is matched, and thus the data is replaced, is controlled using one of the following options:
-
Whole value
-
Contains
-
Starts with
-
Ends with
-
Delimiter match
The matching against the Reference Data may also be case sensitive or case insensitive.
Note that when using the Contains, Starts with, or Ends with options, there may be multiple matches against the lookup column of the reference data. In this case, Replace always makes one, and only one, replacement. So, for example when performing a 'Contains' replacement where the value 'PT' is replaced by 'PINT', the value '10PT - APTITUDE BITTER' would be transformed into '10PINT - APTITUDE BITTER' and not '10PINT APINTITUDE BITTER'.
If you choose to use the Delimiter match option, and split up the data before matching using delimiters, any of the split values that match the lookup column of the replacement map will be replaced, even if there are many matches in the input value.
The way the Replace processor decides how to make its replacement where there are multiple matches can be controlled using a configuration option.
By default, the map is simply checked in order, and the first match against the map from the input data is used for the replacement. So, for example, if your replacement map contains the values 'Lyn' and 'Lynda', where 'Lyn' appears first in the list, the input value 'Lynda' would undergo the replacement using the lookup value 'Lyn' in the map.
However, you can control this using the 'Match longest value' option. If you select this option, each matched reference entry will be assessed for length, and the longest match used. So, in the example above, the replacement using the lookup value 'Lynda' in the map would be performed.
Use the Replace processor for standardization - for example to standardize all CompanyName
values so that different suffixes that mean the same thing are represented in a standard way (for example, Ltd/Limited, Assoc/Assc, Cncl/Council etc.)
Replacing Dates
It is possible to use Replace to replace Date values. However, for this to work, the date values in the Reference Data map must be in the standard ISO format; that is, either YYYY-MM-DD
(for example, 1900-01-01), or YYYY-MM-DD HH:mm:ss
(for example, 1900-01-01 00:00:00). Note that it is possible to replace a Date with a Null value - for example to remove invalid dates.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify a single attribute from which you want to replace values using a reference data map. The attribute may be a String, or a String Array. If an array is input, the replacements will be made at the array element level, and an array (with the data after the replacements have been performed) will be output. |
Options |
Specify the following options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
The following flags are output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Transformed |
The number of records where a replacement was performed. Drill down on the number to see the records. |
Untransformed |
The number of records where a replacement was not performed. |
Invalid |
The number of records where the replacement failed as the replacement value was invalid for the input data type. |
Note:
It is possible to use the Replace processor with attributes of any data type - Strings, Arrays, Numbers, or Dates. However, as Replace always uses the data type of the input attribute for the output attribute, there are some transformations you can choose to make that will mean the replaced value is invalid for the data type of the output attribute. For example, if you attempt to replace the Date value '2006-04-14' with 'Bad date' using a map, the value 'Bad date' is not a valid Date, and so the replacement fails. If you have any invalid replacements, you may need to convert the original attribute to a different data type before performing the replacements, or you may need to modify your Reference Data map to remove any invalid replacements.
Output Filters
The following output filters are available:
-
Records with transformed values
-
Records with untransformed values
-
Records with an invalid replacement
Example
In this example, the Replace processor is used to standardize English Counties and other similar data in attribute Address3 from the Customers table. The output attribute has been named Address3.stand.
In this case a Whole Value replacement was used. The following is an excerpt from the drill-down view of transformed records:
ADDRESS3 | ADDRESS3.stand | CU_NO |
---|---|---|
Lancs |
Lancashire |
13841 |
Cambs |
Cambridgeshire |
14053 |
OXON |
Oxfordshire |
14068 |
Leics |
Leicestershire |
14130 |
Linc |
Lincolnshire |
14207 |
Beds |
Bedfordshure |
14506 |