1.3.10.28 Normalize Whitespace
The Normalize Whitespace processor normalizes all the whitespace in String values so that multiple spaces in between words are normalized to a single space character. It also removes leading and trailing whitespace.
Whitespace is defined in EDQ as:
-
Spaces
-
Non-printable characters, such as carriage returns, line feeds and tabs (and all other ASCII characters 0-31)
Normalize Whitespace is often used before parsing free text fields, to ensure that all values have regular spacing. It is also often useful after other transformations, which may leave extra spaces. For example, when text fields have words or numbers stripped from them, this may leave additional spaces in between words.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify any String or String Array type attributes where you wish to normalize whitespace. Number and Date attributes are not valid inputs. Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output. |
Options |
None. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
None. |
The Normalize Whitespace transformer presents no summary statistics on its processing.
In the Data view, each input attribute is shown with its new derived attribute with whitespace normalized to the right.
Output Filters
None.
Example
In this example, the Normalize Whitespace processor is used to normalize the spaces between words in an attribute containing the first line of an address:
Address1 | Address1.WhitespaceNormalized |
---|---|
Medway House[space][space][space], Bridge Street |
Medway House[space], Bridge Street |
Monarch Mill[space][space], Jones Street |
Monarch Mill[space], Jones Street |
Unit 1[space][space], Barnard Road |
Unit 1[space], Barnard Road |
Alston Street[space][space][space][space], |
Alston Street[space], |