D Component Knowledge Modules
This appendix provides information about the knowledge modules for the Flatten and the Jagged component.
This appendix includes the following sections:
D.1 XKM Oracle Flatten
Note:
Flatten component is supported only with Spark 1.3.The following tables describes the options for XKM Oracle Flatten.
Table D-1 XKM Oracle Flatten
Option | Description |
---|---|
NESTED_TABLE_ALIAS |
Alias used for nested table expression. Default is NST. |
DEFAULT_EXPRESSION |
Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')). |
D.2 XKM Oracle Flatten XML
Un-nest the complex data in an XML file according to the given options.
The following tables describes the options for XKM Oracle Flatten XML.
Table D-2 XKM Oracle Flatten XML
Option | Description |
---|---|
XML_XPATH |
Specify XML path for XMLTABLE function. For example, '/ratings/rating'. |
XML_IS_ATTRIBUTE |
Set to True when data is stored as attribute values of record tag. For example, <row attribute1=..." /> " |
XML_TABLE_ALIAS |
Alias used for XMLTABLE expression. Default is XMLT. |
DEFAULT_EXPRESSION |
Default expression for null XMLTYPE objects. For example, <row> < attribute1/><row/> This is used to return a row with default values for each null XMLTYPE object. |
D.3 XKM Spark Flatten
Un-nest the complex data according to the given options.
The following tables describes the options for XKM Spark Flatten.
Table D-3 XKM Spark Flatten
Option | Description |
---|---|
Default Expression |
Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')). This is used to return a row with default values for each null nested table object. |
CACHE_DATA |
When set to TRUE, persist the results with Spark default storage level. Default is FALSE. |
D.4 XKM Jagged
Jagged component KMs process unstructured data using meta pivoting. Source data, represented as key-value free format, will be transformed into more structured entities in order to be loaded into database tables or file structures. Jagged component has one input group and one or multiple output groups based on the configuration of the component. Input group is connected to a source component, which has e key-value or id-key-value structure. Output groups are connected to the target components where data is stored in more structured way, that is, keys become column names and values are stored as table rows. Jagged KM is parsing the source data and is looking for key data matching the output group attributes. Once the relevant keys are identified the corresponding data is stored into a row. In case of key-value source each incoming record is delimited by a key marked as End of Data Indicator. In case of id-key-value source incoming records are delimited by a new value of the sequence defined as id. Target records can be consolidated by removing duplicates based on Unique Index attribute property. Some attributes can be labeled as required, meaning no new record is stored if any of the required keys is missing. Default values can be defined for some missing keys.
The following tables describes the options for XKM Jagged.
Table D-4 XKM Jagged
Option | Description |
---|---|
TMP_DIR |
Directory for temporary files. |
FIELD_DELIMITER |
Field delimiter for temporary files. |
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. |