Split Records from Array
The Split Records from Array processor allows you to create many records from a single record, by splitting out new records for each element in an input array.
Use Split Records from Array where data that should be represented in many records has been wrongly captured in a single record; that is, to normalize data that has been wrongly denormalized.
Often, the denormalized data that you need to split out needs to be pre-processed, before using this processor. For example, in the following Orders table, multiple orders (with multiple order numbers and product descriptions) have been wrongly entered into a single record, using free text fields:
Order_ID | Order_Number | Product_Desc |
---|---|---|
O574112 |
2788143 / 2788144 |
Home PC Package / Color Printer |
In this case, the Order_Number
and Product_Desc
attributes both need simple pre-processing using the Make Array from String processor to create arrays using the / character as a separator. The arrays can then be input into Split Records from Array to split out the records as follows:
Order_ID | Order_Number.normalized | Product_Desc.normalized |
---|---|---|
O574112 |
2788143 |
Home PC Package |
O574112 |
2788144 |
Color Printer |
Note above that many array attributes may be input to this processor. In this case, the number of output records for each input record will correspond to the number of elements in the array attribute with the largest number of elements. Data from each attribute that is not input is simply copied to all of the output records created from each input record. For example, if you split the following record, inputting Title.array and FirstName.array
:
Cust_ID | Title.array | FirstName.array | Surname |
---|---|---|---|
13451 |
{Mr}{Mrs} |
{John}{Dorothy}{James} |
Smith |
The output records will be as follows:
Cust_ID | Title.array | FirstName.array | Surname |
---|---|---|---|
13451 |
Mr |
John |
Smith |
13451 |
Mrs |
Dorothy |
Smith |
13451 |
James |
Smith |
Note that Title.array.normalized
is Null for the last record, as there is no array element in Title.array
that corresponds with the third element in FirstName.array
.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify one or more Array attributes that you want to use to split records. Each element of the input array will be output as a single value. |
Options |
None. |
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
The following flags are output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Input records |
The number of records that were input (that is, before splitting). |
Output records |
The number of records that were output (that is, after splitting). Drill-down to see all output records. |
Split % |
The percentage of input records that were split into multiple output records. Drill-down to see the output records where a split occurred (that is, records where one of the input array attributes contained more than one element). |
Output Filters
None.
Example
In this example, a data set of People's Names is being prepared for matching. The data contains a number of aliases and alternative spellings for people's names. These are pre-processed into a single Aliases.Array
attribute, and then split out using Split Records from Array so that each name can be matched separately.
Aliases.Array | Aliases.Array.Split |
---|---|
{Jose Angel Veron}{Jose Veron} |
Jose Veron |
{Jose Angel Veron}{Jose Veron} |
Jose Angel Veron |
{Namik Zouahi}{Namiq Zouahi}{Namig Zouahi} |
Namik Zouahi |
{Namik Zouahi}{Namiq Zouahi}{Namig Zouahi} |
Namiq Zouahi |
{Namik Zouahi}{Namiq Zouahi}{Namig Zouahi} |
Namig Zouahi |
{Christine Moss}{Christine Lee}{Christine Graham} |
Christine Moss |
{Christine Moss}{Christine Lee}{Christine Graham} |
Christine Lee |
{Christine Moss}{Christine Lee}{Christine Graham} |
Christine Graham |