1.3.5.1 Process Product Data
The Process Product Data processor connects to an instance of Oracle Enterprise Data Quality for Product Data (EDQ-P) and uses a production Data Service Application (DSA) to process product data using semantic rules; for example, to enhance and add structure to unstructured product data.
Note:
The processor will only appear if the EDQ server is configured to connect to an EDQ-P instance using an edqp.properties file. This file must be created in oedq_local_home/edqp folder with the following settings:
-
server = [name or IP address of the EDQ-P server]
-
port = [the http port of EDQ-P server. This will be 2229 in a default installation]
-
batchsize = [number of records to submit to EDQ-P at a time – defaults to 1000]
A batchsize greater than 1000 may cause an Out of Memory error.
The Process Product Data processor allows EDQ-P to be used within an EDQ process to parse and match product data with a DSA.
The following table describes the configuration options:
Note:
This processor always appears with a re-run marker, indicating that it will be completely re-executed each time the process is run, regardless of whether or not its configuration has changed. This will also mean that processors that are downstream of the processor will need to be rerun. This is because there may be changes made outside of the OEDQ application that could lead to different results on subsequent executions.
Configuration | Description |
---|---|
Inputs |
The inputs to the processor should correspond to the expected inputs of the selected DSA. |
Options |
Specify the following options:
|
Outputs |
The output attributes from the processor are determined by the selected DSA and Output step in the Options tab. The set of attributes will correspond to the configuration of the output step of the DSA in OEDQ-P. |
Flags |
The following flag is output:
|
Note:
The processor is suitable for record-by-record processing through EDQ-P; for example, for parsing product descriptions using a DSA. For EDQ-P operations that need to work across a record set, such as matching, Oracle recommends calling an EDQ-P job using an EDQ External Task, and sharing data using either files or a staged data area in a database. As EDQ is by its nature multi-threaded, the processor assumes that the DSA it uses can scale horizontally by calling multiple instances of an EDQ-P job (one per thread).
The Process Product Data processor presents no summary statistics on its processing.
In the Data view, each input attribute is shown with the output attributes to the right.
Output Filters
The following are output filters:
-
Returned
– records that were returned from the selected DSA and output step. -
Not Returned
– records that were input to, but not returned from, the selected DSA and output step.
Example
In this example, an OEDQ-P DSA is used to parse and enhance unstructured product descriptions relating to Electrical Resistors.
id | description | edqp.Id | edqp.Description |
---|---|---|---|
5001 |
RESP ARY 5% 16 PIN 10OHM |
5001 |
Resistor 10 Ohm 5% 16 Pin Array |
5002 |
!gz9m;;) v!#Q 8jmASKqtfA7 |
||
5003 |
mfax 75 ohm 1/4 w resp 20% |
5003 |
Resistor 75 Ohm 20% 0.25 Watt Array |
5004 |
array 16 pin 85 ohm 5% resp |
5004 |
Resistor 85 Ohm 5% 16 Pin Array |
5005 |
array 16 pin 62 Ohm 5% RESP |
5005 |
Resistor 62 Ohm 5% 16 Pin Array |
5006 |
array 16 pin 62 Ohm 5% RESP |
5006 |
Resistor 62 Ohm 5% 16 Pin Array |
5007 |
1% 1/10 W THN CH2.21 OHM R... |
5007 |
Resistor 2.21 Ohm 1% 0.1 Watt T... |