Extract Attributes
The Extract Attributes processor takes a single string containing any text as input, and outputs distinct pieces of information from the string, using reference data (regular expressions and/or literal strings) to drive the identification of attributes within the string. For example, it could be used to extract specific data items such as part number, quantity, color, etc from a product description text field.
It outputs the information as a correlated pair of arrays, one containing the attribute labels and the other containing their values.
It will also output Remaining Input, a representation of the input text with all extracted values stripped out leaving only the remaining text where no matches were found and no extractions made.
This affects the way that values are extracted. For example, if you want to extract Business Suffixes from a Company Name attribute, you may want to extract them only if the value ends with the value in the list.
Configuration | Description |
---|---|
Inputs |
The string to extract the attributes from. |
Options |
Specify the following options:
|
Outputs |
Number of records with extraction performed and extraction not performed. |
Data Attributes |
The following data attributes are output:
|
Flags |
The following flags are output:
|
In this example a string is input, and result attributes and its values are output.
Input String | Result Attribute/Result Value |
---|---|
TEAO HP = 1/4 1725RPM 115V 48YZ YOKE MTR |
attributearray= {”Definition”, ”Brand”} valuearray= {”HP = 1/4”, ”TEAO”} remaininginput= 1725RPM 115V 48YZ YOKE MTR |
Pencils #2HB Nontoxic Lead 12 / Box Wood |
attributearray= {”Graphite Grade”, ”Grouping”, ”Stationary Type”} valuearray= {”#2HB”, ”12 / BOX”, ”Pencils”} remaininginput= Nontoxic Lead Wood |