Component Knowledge Modules

D Component Knowledge Modules

This appendix provides information about the knowledge modules for the Flatten and the Jagged component.

This appendix includes the following sections:

D.1 XKM Oracle Flatten

Un-nest the complex data according to the given options.

Note:

Flatten component is supported only with Spark 1.3.

The following tables describes the options for XKM Oracle Flatten.

Table D-1 XKM Oracle Flatten

Option	Description
NESTED_TABLE_ALIAS	Alias used for nested table expression. Default is NST.
DEFAULT_EXPRESSION	Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')).

Option

Description

NESTED_TABLE_ALIAS

Alias used for nested table expression.

Default is NST.

DEFAULT_EXPRESSION

Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')).

D.2 XKM Oracle Flatten XML

Un-nest the complex data in an XML file according to the given options.

The following tables describes the options for XKM Oracle Flatten XML.

Table D-2 XKM Oracle Flatten XML

Option	Description
XML_XPATH	Specify XML path for XMLTABLE function. For example, '/ratings/rating'.
XML_IS_ATTRIBUTE	Set to True when data is stored as attribute values of record tag. For example, <row attribute1=..." /> "
XML_TABLE_ALIAS	Alias used for XMLTABLE expression. Default is XMLT.
DEFAULT_EXPRESSION	Default expression for null XMLTYPE objects. For example, <row> < attribute1/><row/> This is used to return a row with default values for each null XMLTYPE object.

D.3 XKM Spark Flatten

Un-nest the complex data according to the given options.

The following tables describes the options for XKM Spark Flatten.

Table D-3 XKM Spark Flatten

Option	Description
Default Expression	Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')). This is used to return a row with default values for each null nested table object.
CACHE_DATA	When set to TRUE, persist the results with Spark default storage level. Default is FALSE.

Option

Description

Default Expression

Default expression for null nested table objects. For example, rating_table(obj_rating('-1', 'Unknown')).

This is used to return a row with default values for each null nested table object.

CACHE_DATA

When set to TRUE, persist the results with Spark default storage level.

Default is FALSE.

D.4 XKM Jagged

Jagged component KMs process unstructured data using meta pivoting. Source data, represented as key-value free format, will be transformed into more structured entities in order to be loaded into database tables or file structures. Jagged component has one input group and one or multiple output groups based on the configuration of the component. Input group is connected to a source component, which has e key-value or id-key-value structure. Output groups are connected to the target components where data is stored in more structured way, that is, keys become column names and values are stored as table rows. Jagged KM is parsing the source data and is looking for key data matching the output group attributes. Once the relevant keys are identified the corresponding data is stored into a row. In case of key-value source each incoming record is delimited by a key marked as End of Data Indicator. In case of id-key-value source incoming records are delimited by a new value of the sequence defined as id. Target records can be consolidated by removing duplicates based on Unique Index attribute property. Some attributes can be labeled as required, meaning no new record is stored if any of the required keys is missing. Default values can be defined for some missing keys.

The following tables describes the options for XKM Jagged.

Table D-4 XKM Jagged

Option	Description
TMP_DIR	Directory for temporary files.
FIELD_DELIMITER	Field delimiter for temporary files.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping.