A Hive Knowledge Modules
This appendix provides information about the Hive knowledge modules.
This appendix includes the following sections:
A.1 LKM SQL to Hive SQOOP
This KM integrates data from a JDBC data source into Hive.
-
Create a Hive staging table.
-
Create a SQOOP configuration file, which contains the upstream query.
-
Execute SQOOP to extract the source data and import into Hive
-
Drop the Hive staging table.
This is a direct load LKM and will ignore any of the target IKM.
The following table descriptions the options for LKM SQL to Hive SQOOP.
Table A-1 LKM SQL to Hive SQOOP
Option | Description |
---|---|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true. |
SQOOP_PARALLELISM |
Number of SQOOP parallel mappers Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined. |
SPLIT_BY |
Target column name for splitting the source data. Specifies the unqualified target column name to be used for splitting the source data into n chunks for parallel extraction, where n is SQOOP_PARALLELISM. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to SELECT MIN(EMPNO), MAX(EMPNO) from EMPLOYEE EMP is used. To avoid an extra full table scan the split column should be backed by an index. |
BOUNDARY_QUERY |
Query to retrieve min/max value for calculating data chunks using SPLIT_BY column. For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option SPLIT-BY). In certain situations this may not be the best boundaries or not the most performant way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into SQOOP_PARALLELISM chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: SELECT 1000, 2000 FROM DUAL For preserving context independence regular table names should be inserted through odiRef.getObjectName calls. For example: SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName(EMP")%>" |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
USE_GENERIC_JDBC_CONNECTOR |
Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution. |
EXTRA_HADOOP_CONF_PROPERTIES |
Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties. |
EXTRA_SQOOP_CONF_PROPERTIES |
Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties. |
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES |
Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties. |
A.2 LKM SQL to File SQOOP Direct
This KM extracts data from a JDBC data source into an HDFS file
It executes the following steps:
-
Create a SQOOP configuration file, which contains the upstream query.
-
Execute SQOOP to extract the source data and store it as an HDFS file
This is a direct load LKM and must be used without any IKM.
Note:
The entire target directory will be removed before extraction.
The following table descriptions the options for LKM SQL to File SQOOP Direct.
Table A-2 LKM SQL to File SQOOP Direct
Option | Description |
---|---|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true. |
SQOOP_PARALLELISM |
Number of SQOOP parallel mappers Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined. |
SPLIT_BY |
Target column name for splitting the source data. Specifies the unqualified target column name to be used for splitting the source data into n chunks for parallel extraction, where n is SQOOP_PARALLELISM. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to SELECT MIN(EMPNO), MAX(EMPNO) from EMPLOYEE EMP is used. To avoid an extra full table scan the split column should be backed by an index. |
BOUNDARY_QUERY |
Query to retrieve min/max value for calculating data chunks using SPLIT_BY column. For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option SPLIT-BY). In certain situations this may not be the best boundaries or not the most performant way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into SQOOP_PARALLELISM chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: SELECT 1000, 2000 FROM DUAL For preserving context independence regular table names should be inserted through odiRef.getObjectName calls. For example: SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName(EMP")%>" |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
USE_GENERIC_JDBC_CONNECTOR |
Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution. |
EXTRA_HADOOP_CONF_PROPERTIES |
Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties. |
EXTRA_SQOOP_CONF_PROPERTIES |
Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties. |
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES |
Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties. |
A.3 LKM SQL to HBase SQOOP Direct
This KM extacts data from a JDBC data source and imports the data into HBase.
It executes the following steps:
-
Create a SQOOP configuration file, which contains the upstream query.
-
Execute SQOOP to extract the source data and import into HBase.
This is a direct load LKM and must be used without any IKM.
The following table descriptions the options for LKM SQL to HBase SQOOP Direct.
Table A-3 LKM SQL to HBase SQOOP Direct
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table? Check this option, to create the target table. |
TRUNCATE |
Replace existing target data? Set this option to true, to replace any existing target table content with the new data. |
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true. |
SQOOP_PARALLELISM |
Number of SQOOP parallel mappers Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined. |
SPLIT_BY |
Target column name for splitting the source data. Specifies the unqualified target column name to be used for splitting the source data into n chunks for parallel extraction, where n is SQOOP_PARALLELISM. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to SELECT MIN(EMPNO), MAX(EMPNO) from EMPLOYEE EMP is used. To avoid an extra full table scan the split column should be backed by an index. |
BOUNDARY_QUERY |
Query to retrieve min/max value for calculating data chunks using SPLIT_BY column. For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option SPLIT-BY). In certain situations this may not be the best boundaries or not the most performant way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into SQOOP_PARALLELISM chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: SELECT 1000, 2000 FROM DUAL For preserving context independence regular table names should be inserted through odiRef.getObjectName calls. For example: SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName(EMP")%>" |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
USE_GENERIC_JDBC_CONNECTOR |
Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution. |
EXTRA_HADOOP_CONF_PROPERTIES |
Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties. |
EXTRA_SQOOP_CONF_PROPERTIES |
Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties. |
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES |
Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties. |
A.4 LKM File to SQL SQOOP
This KM integrates data from HDFS files into a JDBC target.
It executes the following steps:
-
Create a SQOOP configuration file
-
Load data using SQOOP into a work table on RDBMS
-
Drop the work table.
The following table descriptions the options for LKM File to SQL SQOOP.
Table A-4 LKM File to SQL SQOOP
Option | Description |
---|---|
SQOOP_PARALLELISM |
Number of SQOOP parallel mappers. Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined. |
WORK_TABLE_OPTIONS |
Work table options. Use this option to override standard technology specific work table options. When left blank, these options values are used. Oracle: NOLOGGING DB2 UDB: NOT LOGGED INITIALLY Teradata: no fallback, no before journal, no after journal |
TERADATA_WORK_TABLE_TYPE |
Teradata work table type. Use SET or MULTISET table for work table. |
TERADATA_OUTPUT_METHOD |
Teradata Load Method. Specifies the way the Teradata Connector will load the data. Valid values are:
Please see Cloudera's Teradata Connectors User Guide for more details. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
USE_GENERIC_JDBC_CONNECTOR |
Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution. |
EXTRA_HADOOP_CONF_PROPERTIES |
Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties. |
EXTRA_SQOOP_CONF_PROPERTIES |
Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties. |
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES |
Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties. |
A.5 LKM Hive to SQL SQOOP
This KM integrates data from Hive into a JDBC target.
It executes the following steps:
-
Unload data into HDFS
-
Create a SQOOP configuration file
-
Load data using SQOOP into a work table on RDBMS
-
Drop the work table
The following table descriptions the options for LKM Hive to SQL SQOOP.
Table A-5 LKM Hive to SQL SQOOP
Option | Description |
---|---|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
SQOOP_PARALLELISM |
Number of SQOOP parallel mappers. Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined. |
WORK_TABLE_OPTIONS |
Work table options. Use this option to override standard technology specific work table options. When left blank, these options values are used. Oracle: NOLOGGING DB2 UDB: NOT LOGGED INITIALLY Teradata: no fallback, no before journal, no after journal |
TERADATA_WORK_TABLE_TYPE |
Teradata work table type. Use SET or MULTISET table for work table. |
TERADATA_OUTPUT_METHOD |
Teradata Load Method. Specifies the way the Teradata Connector will load the data. Valid values are:
Please see Cloudera's Teradata Connectors User Guide for more details. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
USE_GENERIC_JDBC_CONNECTOR |
Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution. |
EXTRA_HADOOP_CONF_PROPERTIES |
Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties. |
EXTRA_SQOOP_CONF_PROPERTIES |
Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties. |
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES |
Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties. |
A.6 LKM HBase to SQL SQOOP
This KM integrates data from HBase into a JDBC target.
It executes the following steps:
-
Create a SQOOP configuration file
-
Create a Hive table definition for the HBase table
-
Unload data from Hive (HBase) using SQOOP into a work table on RDBMS
-
Drop the work table.
The following table descriptions the options for LKM HBase to SQL SQOOP.
Table A-6 LKM HBase to SQL SQOOP
Option | Description |
---|---|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true. |
HIVE_STAGING_LSCHEMA |
Logical schema name for Hive-HBase-SerDe table. The unloading from HBase data is done through Hive. This KM option defines the Hive database, which will be used for creating the Hive HBase-SerDe table for unloading the HBase data. |
SQOOP_PARALLELISM |
Number of SQOOP parallel mappers. Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined. |
WORK_TABLE_OPTIONS |
Work table options. Use this option to override standard technology specific work table options. When left blank, these options values are used. Oracle: NOLOGGING DB2 UDB: NOT LOGGED INITIALLY Teradata: no fallback, no before journal, no after journal |
TERADATA_WORK_TABLE_TYPE |
Teradata work table type. Use SET or MULTISET table for work table. |
TERADATA_OUTPUT_METHOD |
Teradata Load Method. Specifies the way the Teradata Connector will load the data. Valid values are:
Please see Cloudera's Teradata Connectors User Guide for more details. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
USE_GENERIC_JDBC_CONNECTOR |
Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution. |
EXTRA_HADOOP_CONF_PROPERTIES |
Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties. |
EXTRA_SQOOP_CONF_PROPERTIES |
Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties. |
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES |
Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties. |
A.7 LKM HDFS File to Hive Load Data
This KM will load data only from HDFS file into Hive. The file can be in the format of JSON, Avro, Parquet, Delimited with complex data.
Table A-7 LKM HDFS File to Hive Load Data
Option | Description |
---|---|
STOP_ON_FILE_NOT_FOUND |
This checkbox option defines whether the KM should stop, if no input file is found. |
OVERRIDE_ROW_FORMAT |
This option allows to override the entire Hive row format definition of the staging table or the target table. |
DELETE_TEMPORARY_OBJECTS |
Set this option to No, to retain the temporary objects (tables, files and scripts) post integration. |
A.8 LKM HDFS File to Hive Load Data (Direct)
This KM will load data only from HDFS file into Hive Data Direct directly into hive target table, bypassing the staging table for better performance.
Table A-8 LKM HDFS to Hive Load Data (Direct)
Option | Description |
---|---|
STOP_ON_FILE_NOT_FOUND |
This checkbox option defines whether the KM should stop, if no input file is found. |
OVERRIDE_ROW_FORMAT |
This option allows to override the entire Hive row format definition of the staging table or the target table. |
DELETE_TEMPORARY_OBJECTS |
Set this option to No, to retain the temporary objects (tables, files and scripts) post integration. |
CREATE_TARG_TABLE |
Create target table? Check this option, to create the target table. |
TRUNCATE |
Replace existing target data? Set this option to true, to replace any existing target table content with the new data. |
A.9 IKM Hive Append
This KM integrates data into a Hive target table in append or replace (truncate) mode.
The following table descriptions the options for IKM Hive Append.
Table A-9 IKM Hive Append
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table. Check this option if you wish to create the target table. |
TRUNCATE |
Replace all target table data. Set this option to true, if you wish to replace the target table content with the new data. |
Note:
If there is a column containing a Complex Type in the target Hive table, this must not be left unmapped. Hive does not allow setting null values to complex columns.A.10 IKM Hive Incremental Update
This IKM integrates data incrementally into a Hive target table. The KM should be assigned on Hive target node.
Target data store integration type needs to be defined as Incremental Update to get this KM on the list of available KMs for assignment.
Table A-10 IKM Hive Incremental Update
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table. Select this option to create the target table. |
TRUNCATE |
Replace all target table data. Set this option to true, to replace the target table content with the new data. |
A.11 LKM File to Hive LOAD DATA
Integration from a flat file staging area to Hive using Hive's LOAD DATA command.
This KM executes the following steps:
-
Create a flow table in Hive
-
Declare data files to Hive (LOAD DATA command)
-
Load data from Hive staging table into target table
The KM can handle filename wildcards (*, ?).">
The following table describes the options for LKM File to Hive LOAD DATA.
Table A-11 LKM File to Hive LOAD DATA
Option | Description |
---|---|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
EXTERNAL_TABLE |
Preserve file in original location? Defines whether to declare the target/staging table as externally managed. Default: false For non-external tables Hive manages all data files. That is, it will *move* any data files into <hive.metastore.warehouse.dir>/<table_name>. For external tables Hive does not move or delete any files. It will load data from the location given by the ODI schema. If EXTERNAL_TABLE is set to true: All files in the directory given by the physical data schema will be loaded. So any filename or wildcard information from the source data store's resource name will be ignored. The directory structure and file names must follow Hives directory organization for tables, for example, for partitioning and clustering. The directory and its files must reside in HDFS. No Hive LOAD-DATA-statements are submitted and thus loading of files to a specific partition (using a target-side expression) is not possible. |
FILE_IS_LOCAL |
Is this a local file? Defines whether the source file is to be considered local (= outside of the current Hadoop cluster). Default: true If FILE_IS_LOCAL is set to true, the data file(s) are copied into the Hadoop cluster first. If FILE_IS_LOCAL is set to false, the data file(s) are moved into the Hadoop cluster and therefore will no longer be available at their source location. If the source file is already in HDFS, FILE_IS_LOCAL=false results in just a file rename and therefore very fast operation. This option only applies, if EXTERNAL_TABLE is set to false. |
STOP_ON_FILE_NOT_FOUND |
Stop if no input file was found? This checkbox option defines whether the KM should stop, if no input file has been found. |
OVERRIDE_ROW_FORMAT |
Custom row format clause. This option allows to override the entire Hive row format definition of the staging table (in case USE_STAGE_TABLE is set to true) or the target table (in case USE_STAGE_TABLE is set to false). It contains the text to be used for row format definition. Example for reading Apache Combined WebLog files: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' <EOL>WITH SERDEPROPERTIES ( <EOL> input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (\".*?\")" |
A.12 LKM File to Hive LOAD DATA Direct
Direct integration from a flat file into Hive without any staging using Hive's LOAD DATA command.
This is a direct load LKM and must be used without any IKM.
The KM can handle filename wildcards (*, ?).
The following table describes the options for LKM File to Hive LOAD DATA Direct.
Table A-12 LKM File to Hive LOAD DATA Direct
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table. Check this option if you wish to create the target table. |
TRUNCATE |
Replace all target table data. Set this option to true, if you wish to replace the target table content with the new data. |
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
EXTERNAL_TABLE |
Preserve file in original location? Defines whether to declare the target/staging table as externally managed. Default: false For non-external tables Hive manages all data files. That is, it will *move* any data files into <hive.metastore.warehouse.dir>/<table_name>. For external tables Hive does not move or delete any files. It will load data from the location given by the ODI schema. If EXTERNAL_TABLE is set to true: All files in the directory given by the physical data schema will be loaded. So any filename or wildcard information from the source data store's resource name will be ignored. The directory structure and file names must follow Hives directory organization for tables, for example, for partitioning and clustering. The directory and its files must reside in HDFS. No Hive LOAD-DATA-statements are submitted and thus loading of files to a specific partition (using a target-side expression) is not possible. |
FILE_IS_LOCAL |
Is this a local file? Defines whether the source file is to be considered local (= outside of the current Hadoop cluster). Default: true If FILE_IS_LOCAL is set to true, the data file(s) are copied into the Hadoop cluster first. If FILE_IS_LOCAL is set to false, the data file(s) are moved into the Hadoop cluster and therefore will no longer be available at their source location. If the source file is already in HDFS, FILE_IS_LOCAL=false results in just a file rename and therefore very fast operation. This option only applies, if EXTERNAL_TABLE is set to false. |
STOP_ON_FILE_NOT_FOUND |
Stop if no input file was found? This checkbox option defines whether the KM should stop, if no input file has been found. |
OVERRIDE_ROW_FORMAT |
Custom row format clause. This option allows to override the entire Hive row format definition of the staging table (in case USE_STAGE_TABLE is set to true) or the target table (in case USE_STAGE_TABLE is set to false). It contains the text to be used for row format definition. Example for reading Apache Combined WebLog files: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' <EOL>WITH SERDEPROPERTIES ( <EOL> input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (\".*?\")" |
A.13 LKM HBase to Hive HBASE-SERDE
This LKM provides read access to a HBase table from the Hive.
This is achieved by defining a temporary load table definition on Hive which represents all relevant columns of the HBase source table.
A.14 LKM Hive to HBase Incremental Update HBASE-SERDE Direct
This LKM loads data from Hive into HBase and supports inserting new rows and, also updating existing data.
This is a direct load LKM and must be used without any IKM.
The following table describes the options for LKM Hive to HBase Incremental Update HBASE-SERDE Direct.
Table A-13 LKM Hive to HBase Incremental Update HBASE-SERDE Direct
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table. Check this option to create the target table. |
TRUNCATE |
Replace all target table data. Set this option to true, to replace the target table content with the new data. |
HBASE_WAL |
Disable Write-Ahead-Log. HBase uses a Write-Ahead-Log to protect against data loss. For better performance, WAL can be disabled. This setting applies to all Hive commands executed later in this session. |
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
A.15 LKM Hive to File Direct
This LKM unloads data from Hive into flat files.
This is a direct load LKM and must be used without any IKM.
The following table describes the options for LKM Hive to File Direct.
Table A-14 LKM Hive to File Direct
Option | Description |
---|---|
FILE_IS_LOCAL |
Is this a local file? Defines whether the target file is to be considered local (outside of the current Hadoop cluster). |
STORED_AS |
File format. Defines whether the target file is to be stored as plain text file (TEXTFILE) or compressed (SEQUENCEFILE). |
A.16 XKM Hive Sort
This XKM sorts data using an expression.
The following table describes the options for XKM Hive Sort.
Table A-15 XKM Hive Sort
Option | Description |
---|---|
SORT_MODE |
Select the mode the SORT operator will generate code for. |
A.17 LKM File to Oracle OLH-OSCH
This KM integrates data from an HDFS file into an Oracle staging table using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH).
The KM can handle filename wildcards (*, ?).
The following table describes the options for LKM File to Oracle OLH-OSCH.
Table A-16 LKM File to Oracle OLH-OSCH
Option | Description |
---|---|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
OLH_OUTPUT_MODE |
How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY|DP_OSCH, and OSCH.
|
REJECT_LIMIT |
Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions and, also in external table definitions. |
EXT_TAB_DIR_LOCATION |
Directory for ext tab data files. File system path of the external table. Note:
|
WORK_TABLE_OPTIONS |
Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty. |
OVERRIDE_INPUTFORMAT |
Class name of InputFormat. By default the InputFormat class is derived from the source Data Store/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression. |
EXTRA_OLH_CONF_PROPERTIES |
Optional extra OLH properties. Allows adding extra parameters to OLH. For example, for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ \]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>" |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
A.18 LKM File to Oracle OLH-OSCH Direct
This KM integrates data from an HDFS file into an Oracle target using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH)
The KM can handle filename wildcards (*, ?).
This is a direct load LKM (no staging) and must be used without any IKM.
The following table describes the options for LKM File to Oracle OLH-OSCH Direct.
Table A-17 LKM File to Oracle OLH-OSCH Direct
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table. Check this option to create the target table. |
TRUNCATE |
Replace all target table data. Set this option to true, to replace the target table content with the new data. |
DELETE_ALL |
Delete all rows. Set this option to true, to replace the target table content with the new data. |
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
OLH_OUTPUT_MODE |
How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY|DP_OSCH, and OSCH.
|
REJECT_LIMIT |
Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions and, also in external table definitions. |
EXT_TAB_DIR_LOCATION |
Directory for ext tab data files. File system path of the external table. Note:
|
WORK_TABLE_OPTIONS |
Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty. |
OVERRIDE_INPUTFORMAT |
Class name of InputFormat. By default the InputFormat class is derived from the source Data Store/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression. |
EXTRA_OLH_CONF_PROPERTIES |
Optional extra OLH properties. Allows adding extra parameters to OLH. For example, for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ \]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>" |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
A.19 LKM Hive to Oracle OLH-OSCH
This KM integrates data from a Hive query into an Oracle staging table using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH).
The following table describes the options for LKM Hive to Oracle OLH-OSCH.
Table A-18 LKM Hive to Oracle OLH-OSCH
Option | Description |
---|---|
USE_HIVE_STAGING_TABLE |
Use intermediate Hive staging table? By default the Hive source data materializes in a Hive staging table before extraction by OLH. If USE_HIVE_STAGING_TABLE is set to false, OLH directly accesses the Hive source data. USE_HIVE_STAGING_TABLE=0 is only possible, if all these conditions are true.
|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
OLH_OUTPUT_MODE |
How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY|DP_OSCH, and OSCH.
|
REJECT_LIMIT |
Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions and, also in external table definitions. |
EXT_TAB_DIR_LOCATION |
Directory for ext tab data files. File system path of the external table. Note:
|
WORK_TABLE_OPTIONS |
Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty. |
OVERRIDE_INPUTFORMAT |
Class name of InputFormat. By default the InputFormat class is derived from the source Data Store/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression. |
EXTRA_OLH_CONF_PROPERTIES |
Optional extra OLH properties. Allows adding extra parameters to OLH. For example, for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ \]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>" |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
A.20 LKM Hive to Oracle OLH-OSCH Direct
This KM integrates data from a Hive query into an Oracle target using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH)
This is a direct load LKM and must be used without any IKM.
The following table describes the options for LKM Hive to Oracle OLH-OSCH.
Table A-19 LKM Hive to Oracle OLH-OSCH Direct
Option | Description |
---|---|
CREATE_TARG_TABLE |
Create target table. Check this option to create the target table. |
TRUNCATE |
Replace all target table data. Set this option to true, to replace the target table content with the new data. |
DELETE_ALL |
Delete all rows. Set this option to true, to replace the target table content with the new data. |
USE_HIVE_STAGING_TABLE |
Use intermediate Hive staging table? By default the Hive source data materializes in a Hive staging table before extraction by OLH. If USE_HIVE_STAGING_TABLE is set to false, OLH directly accesses the Hive source data. USE_HIVE_STAGING_TABLE=0 is only possible, if all these conditions are true.
|
DELETE_TEMPORARY_OBJECTS |
Delete temporary objects at end of mapping. Set this option to NO, to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. |
OLH_OUTPUT_MODE |
How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY|DP_OSCH, and OSCH.
|
REJECT_LIMIT |
Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions and, also in external table definitions. |
EXT_TAB_DIR_LOCATION |
Directory for ext tab data files. File system path of the external table. Note:
|
WORK_TABLE_OPTIONS |
Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty. |
OVERRIDE_INPUTFORMAT |
Class name of InputFormat. By default the InputFormat class is derived from the source Data Store/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression. |
EXTRA_OLH_CONF_PROPERTIES |
Optional extra OLH properties. Allows adding extra parameters to OLH. For example, for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ \]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>" |
MAPRED_OUTPUT_BASE_DIR |
MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
TEMP_DIR |
Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)". |
A.21 RKM Hive
RKM Hive reverses these metadata elements:
-
Hive tables and views as data stores.
Specify the reverse mask in the Mask field, and then select the tables and views to reverse. The Mask field in the Reverse Engineer tab filters reverse-engineered objects based on their names. The Mask field cannot be empty and must contain at least the percent sign (%).
-
Hive columns as attributes with their data types.
-
Information about buckets, partitioning, clusters, and sort columns are set in the respective flex fields in the data store or column metadata.
A.22 RKM HBase
RKM HBase reverses these metadata elements:
-
HBase tables as data stores.
Specify the reverse mask in the Mask field, and then select the tables to reverse. The Mask field in the Reverse Engineer tab filters reverse-engineered objects based on their names. The Mask field cannot be empty and must contain at least the percent sign (%).
-
HBase columns as attributes with their data types.
-
HBase unique row key as attribute called
key
.
Note:
This RKM uses the oracle.odi.km
logger for logging. You can enable logging by changing log level for oracle.odi.km
logger to TRACE:16 in ODI-logging-config.xml
as shown below:
<logger name="oracle.odi.km" level="TRACE:16" useParentHandlers="true"/>
<logger name="oracle.odi.studio.message.logger.proxy" level="TRACE:16" useParentHandlers="false"/>
For more information about logging configuration in ODI, see the Runtime Logging for ODI components section in Administering Oracle Data Integrator.
The following table describes the options for RKM HBase.
Table A-20 RKM HBase Options
Option | Description |
---|---|
|
Specifies the maximum number of rows to be scanned during reversing of a table. The default value is |
|
Specifies the key of the row to start the scan on. By default the scan will start on the first row. The row key is specified as a Java expressions returning an instance of |
|
Specifies the key of the row to stop the scan on? By default the scan will run to the last row of the table or up to Only applies if |
|
Restricts the scan to column families, whose name match this pattern. SQL-LIKE wildcards percentage ( |
A.23 IKM File to Hive (Deprecated)
Note: This KM is deprecated and only used for backward compatibility.
IKM File to Hive (Load Data) supports:
-
One or more input files. To load multiple source files, enter an asterisk or a question mark as a wildcard character in the resource name of the file data store (for example,
webshop_*.log
). -
-
Fixed length
-
Delimited
-
Customized format
-
-
-
Immediate or deferred loading
-
Overwrite or append
-
Hive external tables
-
The following table describes the options for IKM File to Hive (Load Data). See the knowledge module for additional details.
Table A-21 IKM File to Hive Options
Option | Description |
---|---|
|
Check this option, if you wish to create the target table. In case |
|
Set this option to true, if you wish to replace the target table/partition content with the new data. Otherwise the new data will be appended to the target table. If |
|
Defines whether the source file is to be considered local (outside of the current Hadoop cluster). If this option is set to This option only applies, if |
|
Defines whether to declare the target/staging table as externally managed. For non-external tables Hive manages all data files. That is, it will move any data files into If this option is set to
|
|
Defines whether an intermediate staging table will be created. A Hive staging table is required if:
In case none of the above is |
|
Removes temporary objects, such as tables, files, and scripts after integration. Set this option to |
|
Defines whether the file(s), which have been declared to the staging table should be loaded into the target table now or during a later execution. Permitted values are This option only applies if The typical use case for this option is when there are multiple files and each of them requires data redistribution/sorting and the files are gathered by calling the interface several times. For example, the interface is used in a package, which retrieves (many small) files from different locations and the location, stored in an Oracle Data Integrator variable, is to be used in a target partition column. In this case the first interface execution will have |
|
Allows to override the entire Hive row format definition of the staging table (in case
The list of columns in the source data store must match the list of input groups in the regular expression (same number of columns and appropriate data types). If |
|
Defines whether the KM should stop, if input file is not found. |
|
Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8.
|
A.24 LKM HBase to Hive (HBase-SerDe) [Deprecated]
Note: This KM is deprecated and only used for backward compatibility.
LKM HBase to Hive (HBase-SerDe) supports:
-
A single source HBase table.
The following table describes the options for LKM HBase to Hive (HBase-SerDe). See the knowledge module for additional details.
Table A-22 LKM HBase to Hive (HBase-SerDe) Options
Option | Description |
---|---|
|
Deletes temporary objects such as tables, files, and scripts post data integration. Set this option to |
A.25 IKM Hive to HBase Incremental Update (HBase-SerDe) [Deprecated]
Note: This KM is deprecated and only used for backward compatibility.
IKM Hive to HBase Incremental Update (HBase-SerDe) supports:
-
Filters, Joins, Datasets, Transformations and Aggregations in Hive
-
Inline views generated by IKM Hive Transform
-
Inline views generated by IKM Hive Control Append
The following table describes the options for IKM Hive to HBase Incremental Update (HBase-SerDe). See the knowledge module for additional details.
Table A-23 IKM Hive to HBase Incremental Update (HBase-SerDe) Options
Option | Description |
---|---|
|
Creates the HBase target table. |
|
Replaces the target table content with the new data. If this option is set to |
|
Deletes temporary objects such as tables, files, and scripts post data integration. Set this option to |
|
Enables or disables the Write-Ahead-Log (WAL) that HBase uses to protect against data loss. For better performance, WAL can be disabled. |
A.26 IKM SQL to Hive-HBase-File (SQOOP) [Deprecated]
Note: This KM is deprecated and only used for backward compatibility.
IKM SQL to Hive-HBase-File (SQOOP) supports:
-
Mappings on staging
-
Joins on staging
-
Filter expressions on staging
-
Datasets
-
Lookups
-
Derived tables
The following table describes the options for IKM SQL to Hive-HBase-File (SQOOP). See the knowledge module for additional details.
Table A-24 IKM SQL to Hive-HBase-File (SQOOP) Options
Option | Description |
---|---|
|
Creates the target table. This option is applicable only if the target is Hive or HBase. |
|
Replaces any existing target table content with the new data. For Hive and HBase targets, the target data is truncated. For File targets, the target directory is removed. For File targets, this option must be set to |
|
Specifies the degree of parallelism. More precisely the number of mapper processes used for extraction. If |
|
Specifies the target column to be used for splitting the source data into n chunks for parallel extraction, where n is |
|
For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option
For preserving context independence, regular table names should be inserted through
|
|
Specifies the directory used for storing temporary files, such as sqoop script, stdout and stderr redirects. Leave this option blank to use system's default temp directory:
|
|
Specifies an hdfs directory, where SQOOP creates subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data. |
|
Deletes temporary objects such as tables, files, and scripts after data integration. Set this option to |
|
Loads data into the Hive work table before loading into the Hive target table. Set this option to Setting this option to
Setting this option to This option is applicable only if the target technology is Hive. |
|
Specifies whether to use the generic JDBC connector if a connector for the target technology is not available. For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector can be used. |
|
Optional generic Hadoop properties. |
|
Optional SQOOP properties. |
|
Optional SQOOP connector properties. |
A.27 IKM Hive Control Append (Deprecated)
Note: This KM is deprecated and only used for backward compatibility.
This knowledge module validates and controls the data, and integrates it into a Hive target table in truncate/insert (append) mode. Invalid data is isolated in an error table and can be recycled. IKM Hive Control Append supports inline view mappings that use either this knowledge module or IKM Hive Transform.
The following table describes the options for IKM Hive Control Append.
Table A-25 IKM Hive Control Append Options
Option | Description |
---|---|
|
Activates flow control. |
|
Recycles data rejected from a previous control. |
|
Controls the target table after having inserted or updated target data. |
|
Creates the target table. |
|
Replaces the target table content with the new data. Setting this option to |
|
Removes the temporary objects, such as tables, files, and scripts after data integration. Set this option to |
|
Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8.
|
A.28 CKM Hive
This knowledge module checks data integrity for Hive tables. It verifies the validity of the constraints of a Hive data store and diverts the invalid records to an error table. You can use CKM Hive for static control and flow control. You must also define these constraints on the stored data.
The following table describes the options for this check knowledge module.
Table A-26 CKM Hive Options
Option | Description |
---|---|
|
Drops error table before execution. When this option is set to |
|
Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8.
|
A.29 IKM Hive Transform (Deprecated)
Note: This KM is deprecated and only used for backward compatibility.
This knowledge module performs transformations. It uses a shell script to transform the data, and then integrates it into a Hive target table using replace mode. The knowledge module supports inline view mappings and can be used as an inline-view for IKM Hive Control Append.
The transformation script must read the input columns in the order defined by the source data store. Only mapped source columns are streamed into the transformations. The transformation script must provide the output columns in the order defined by the target data store.
The following table describes the options for this integration knowledge module.
Table A-27 IKM Hive Transform Options
Option | Description |
---|---|
|
Creates the target table. |
|
Removes the temporary objects, such as tables, files, and scripts post data integration. Set this option to |
|
Defines the file name of the transformation script. This transformation script is used to transform the input data into the output structure. Both local and HDFS paths are supported, for example: Local script location: HDFS script location: Ensure that the following requirements are met:
When the KM option
|
|
Defines the transformation script content. This transformation script is then used to transform the input data into the output structure. If left blank, the file given in Script example (1-to-1 transformation): All mapped source columns are spooled as tab separated data into this script through stdin. This unix script then transforms the data and writes out the data as tab separated data on stdout. The script must provide as many output columns as there are target columns. |
|
Unix/HDFS file permissions for script file in octal notation with leading zero. For example, full permissions for owner and group: 0770. Warning: Using wider permissions like 0777 poses a security risk. See also KM option description for |
|
Provides an optional, comma-separated list of source column names, which enables the knowledge module to distribute the data before the transformation script is applied. |
|
Provides an optional, comma-separated list of source column names, which enables the knowledge module to sort the data before the transformation script is applied. |
|
Provides an optional, comma-separated list of target column names, which enables the knowledge module to distribute the data after the transformation script is applied. |
|
Provides an optional, comma-separated list of target column names, which enables the knowledge module to sort the data after the transformation script is applied. |
A.30 IKM File-Hive to Oracle (OLH-OSCH) [Deprecated]
Note: This KM is deprecated and only used for backward compatibility.
IKM File-Hive to Oracle (OLH-OSCH) integrates data from an HDFS file or Hive source into an Oracle database target using Oracle Loader for Hadoop. Using the mapping configuration and the selected options, the knowledge module generates an appropriate Oracle Database target instance. Hive and Hadoop versions must follow the Oracle Loader for Hadoop requirements.
See Also:
-
Oracle Loader for Hadoop Setup in Oracle Big Data Connectors User's Guide for the required versions of Hadoop and Hive.
-
Configuring the Oracle Data Integrator Agent to Execute Hadoop Jobs for required environment variable settings.
The following table describes the options for this integration knowledge module.
Table A-28 IKM File - Hive to Oracle (OLH-OSCH)
Option | Description |
---|---|
|
Specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY, DP_OSCH, and OSCH.
|
|
Specifies the maximum number of errors for Oracle Loader for Hadoop and external table. Examples: |
|
Creates the target table. |
|
Replaces the target table content with the new data. |
|
Deletes all the data in target table. |
|
Materializes Hive source data before extraction by Oracle Loader for Hadoop. If this option is set to
Setting this option to This option is applicable only if the source technology is Hive. |
|
Uses an intermediate Oracle database staging table. The extracted data is made available to Oracle by an external table. If
Setting this option to |
|
Specifies the file system path of the external table. Please note the following:
|
|
Specifies the directory used for storing temporary files, such as sqoop script, stdout and stderr redirects. Leave this option blank to use system's default temp directory:
|
|
Specifies an HDFS directory, where the Oracle Loader for Hadoop job will create subdirectories for temporary files/datapump output files. |
|
Specifies the attributes for the integration table at create time and used for increasing performance. This option is set by default to |
|
Removes temporary objects, such as tables, files, and scripts post data integration. Set this option to |
|
By default the InputFormat class is derived from the source Data Store/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Cannot be used with Example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression. |
|
Particularly when using custom InputFormats (see KM option Example, (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specifies the regular expression and other parsing details:
|
A.31 IKM File-Hive to SQL (SQOOP) [Deprecated]
Note: This KM is deprecated and only used for backward compatibility.
IKM File-Hive to SQL (SQOOP) supports:
-
Filters, Joins, Datasets, Transformations and Aggregations in Hive
-
Inline views generated by IKM Hive Control Append
-
Inline views generated by IKM Hive Transform
-
Hive-HBase source tables using LKM HBase to Hive (HBase SerDe)
-
File source data (delimited file format only)
The following table describes the options for this integration knowledge module.
Table A-29 IKM File-Hive to SQL (SQOOP)