1.3.10.33 RegEx Split
The RegEx Split processor provides a way to split up the data in an attribute into an array, using a regular expression to define where the splits should occur.
Use RegEx Split to split up data where you need a more advanced way of splitting up the data than using delimiters. For example, you may want to separate the data where one of a set of characters occurs, or a variable length of a set of characters occurs.
Regular Expressions
Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.
Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.
There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify one or more String or String Array attributes. |
Options |
Specify the following options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
The following flags are output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Success |
The number of records which were split using the regular expression. |
Failure |
The number of records which were not split using the regular expression. |
Output Filters
The following output filters are available:
-
Records with a successful split
-
Records with an unsuccessful split
Example
In this example, RegEx Split is used to split data from a Notes attribute on an Employees table either side of a person's initials (2 or 3 upper case characters found in a sequence).
-
Regular expression: ([A-Z]{2,3})
-
Results (successful replacements):
Notes | RegExSplit |
---|---|
started 14/10/1995 JBM ref557 |
{started 14/10/1995 }{ ref557} |
started 15/5/95 JBM ref557 |
{started 15/5/95 }{ ref557} |
start date 15/6/1998 HM etn247 |
{start date 15/6/1998 }{ etn247} |
started 2/1/2004 RLJ ref-1842 |
{started 2/1/2004 }{ ref-1842} |
started 8/10/2000 JBM ref557 |
{started 8/10/2000 }{ ref557} |
started 10/6/2001 JBM ref557 |
{started 10/6/2001 }{ ref557] |