9 Additional Tools Installed
9.1 Copy to Hadoop and OHSH
Copy to Hadoop
Copy to Hadoop makes it easy to identify and copy data from an Oracle Database table to the Apache Hadoop Distributed File System (HDFS) and create an Apache Hive external table over that data. In HDFS, the data can be saved in Oracle Data Pump files (which store data as Oracle data types). This provides an exact copy of the data in the database, without the potential for data type conversion errors. You can then query the data via Hive (and Hadoop technologies that access data through Hive) and by Oracle Big Data SQL. Oracle Data Pump data can be converted to Parquet or ORC using Hive
Oracle Shell for Hadoop Loaders (OHSH)
OHSH is an intuitive command line tool to move data between Hadoop and Oracle Database. It provides a set of declarative commands you can use to copy contents from an Oracle Database table to a Hive table.
OHSH is bundled with both Oracle Big Data SQL and Oracle Big Data Connectors. With Oracle Big Data SQL, OHSH works with Copy to Hadoop. With Oracle Big Data Connectors, OHSH provides an interface to copy contents from Hadoop and Hive to Oracle tables using Oracle Loader for Hadoop (OLH) and Oracle SQL Connector for Hadoop Distributed File System (OSCH).
Although OHSH can serve as a front end for OLH and OSCH as well as for Copy to Hadoop, OLH and OSCH are part of the Oracle Big Data Connectors product and are not installed by Oracle Big Data SQL.
Installation and Configuration of Copy to Hadoop and OHSH
Although Copy to Hadoop and OHSH can be run from a Hadoop cluster node, or an Oracle Database node, it is recommended to run Copy to Hadoop and OHSH from an edge node.
Oracle Big Data SQL installs the kits for Copy to Hadoop and Oracle Shell for Hadoop Loaders on the Hadoop cluster management server and on any Oracle Database server node where you install Oracle Big Data SQL.
You can also manually copy the kits for both tools from the Hadoop cluster management server to an edge node and configure them for use from the edge node. The kits are stored at /opt/oracle/orahivedp-<version>.
. See /opt/oracle/ohsh-<version>/doc/OHSH_EDGE_NODE_README.txt
on the Hadoop cluster management server for instructions.
Note:
In the Oracle Big Data SQL installation, the Copy to Hadoop is fully automated except for the setting of theCP2HADOOP_HOME
listed in the table below. You must set this manually.
You must complete configuration of Oracle Shell for Hadoop Loaders manually, as described in this section.
Environment Variables for OHSH
On the Hadoop cluster management server, the OHSH configuration file
is/opt/oracle/ohsh-<version>/bin/ohsh_config.sh
. On an
Oracle Database node, it is
$ORACLE_HOME/ohsh/bin/ohsh_config.sh
. This file contains
environment required by OHSH. The set of variables needed for the installation on a
Hadoop cluster, an Oracle Database Node or an edge node differ. The table below
shows which variables you need to set for each environment. Some of the required
variables are set for you automatically by the Oracle Big Data SQL
installer.
9.1.1 Completing the OHSH Configuration on Oracle Database Nodes
The Oracle Big Data SQL installation automatically copies the OHSH installation kit to each node where Oracle Big Data SQL is installed. There are some manual steps you must perform to complete the installation.
Note:
The README file at$ORACLE_HOME/ohsh/doc/OHSH_DB_NODE_README.txt
on the Hadoop cluster management server where you installed Oracle Big Data SQL describes the full set of steps ordinarily needed to enable OHSH on an Oracle Database node. However, when OHSH is installed by Oracle Big Data SQL, some of the setup is done for you by the installer. To complete the configuration in this context, use this shortened procedure instead of the full procedure documented in the README.
-
Log on to the Oracle Database server as the
oracle
user. -
Set up OHSH to work with Copy to Hadoop.
-
Create a staging directory in the file system on the Oracle Database node:
$ mkdir <OS_STAGE_DIR>
-
Create an Oracle directory on this staging directory and grant read/write privileges to the Oracle Database user (
oracle
or other). In this example, ORACLE_STAGE_DIR refers to the name of the directory object in Oracle Database. OS_STAGE_DIR refers to the path on the server.SQL> CREATE DIRECTORY <ORACLE_STAGE_DIR> AS '<OS_STAGE_DIR>'; SQL> GRANT READ,WRITE ON DIRECTORY <ORACLE_STAGE_DIR> TO <ORACLE_USER>;
Note:
Later, if you employ the “stage” method to copy Oracle tables, use the OHSHset locationdirectory
command to point to the Oracle directory object name (the name of the directory you created above in this step). -
If you are connecting to a CDH-based Hadoop system (Oracle Big Data Appliance or other), perform the substeps below to enable the OHSH “FUSE” method, in which you copy files to Hive tables over an HDFS Mountable FUSE directory. Note that HDP does not support Fuse, therefore you cannot create a mount point to HDFS on HDP using these steps.
-
On the Oracle Database host, create a FUSE mount to the root of the HDFS system on the Hadoop cluster. Refer to Cloudera’s documentation on installing FUSE. You may find current instructions at Configuring Mountable HDFS on Cloudera’s website.
-
Create the Oracle directory on the FUSE mount and grant privileges. In this example, ORACLE_FUSE_DIR refers to the name of the directory object in Oracle Database. OS_FUSE_MOUNT refers to the path to the mount point on the server.
SQL> CREATE DIRECTORY <ORACLE_FUSE_DIR> AS '<OS_FUSE_MOUNT>'; SQL> GRANT READ,WRITE ON DIRECTORY <ORACLE_FUSE_DIR> TO <ORACLE_USER>;
Note that when you use OHSH later to copy Oracle tables to Hive using the "FUSE" method, use the OHSH
set locationdirectory
command to point to the same<ORACLE_FUSE_DIR>
-
-
-
Edit
$ORACLE_HOME/ohsh/bin/ohsh_config.sh
in order to configure the home directories of dependencies. When OHSH is installed by Oracle Big Data SQL, most of the required variables are already set for you. SetHS2_HOST_PORT
and (if needed)HIVE_SESS_VAR_LIST
as follows.export HS2_HOST_PORT=<HS2_HOST>:<HS2_PORT>
Also set
HIVE_SESS_VAR_LIST
to any required Hive session variables. For example:-
To connect to HiveServer2 with Kerberos authentication:
export HIVE_SESS_VAR_LIST="principal=<The server principal of HiveServer2>"
The Hive principal is specified by the configuration property
hive.server2.authentication.kerberos.principal
inhive-site.xml
. -
To connect to HiveServer2 running in HTTP mode:
export HIVE_SESS_VAR_LIST="transportMode=http;httpPath=<The HTTP endpoint>"
-
To connect to HiveServer2 when SSL is enabled:
export HIVE_SESS_VAR_LIST="ssl=true;sslTrustStore=<Trust Store path>;trustStorePassword=<Trust Store password>"
-
-
If you want to configure OHSH to use Oracle Wallet, set environment the variables
TNS_ADMIN
andWALLET_LOCATION
to the directory path where TNS and Oracle Wallet are stored.export WALLET_LOCATION="<wallet_location>" export TNS_ADMIN="<tns_admin_location>"
Note:
Later, when you start an OHSH session, if the
TNS_ADMIN
andWALLET_LOCATION
are different on the Hadoop cluster then sethadooptnsadmin
andhadoopwalletlocation
:set hadooptnsadmin <tns_admin_hadoop_node> set hadoopwalletlocation <wallet_location_hadoop_node>
-
Clear the
HADOOP_CLASSPATH
environment variable before invoking Oracle Shell for Hadoop Loaders. -
Start an OHSH session and check for resources. This check indicates whether or not the configuration is correct.
$ ohsh ohsh> show resources
You should see the resources
hadoop0
,hive0
, andbash0
.
9.1.2 Completing the OHSH Configuration on the Hadoop Cluster
On the Hadoop cluster management server, Oracle Big Data SQL installs OHSH from an RPM to /opt/oracle/ohsh
. Perform the steps below as root
in order to complete the Hadoop side of the configuration.
Note:
The installation instructions at$ORACLE_HOME/ohsh/doc/OHSH_HADOOP_NODE_README.txt
on the Hadoop cluster management server contain extra information that is not applicable to the installation of Copy to Hadoop and OHSH in the context of the Oracle Big Data SQL installation. Use the instructions below instead.
-
If the Hadoop system is not an Oracle Big Data Appliance, then download the SQL*Plus Instant Client and Oracle Instant Client for Linux from the Oracle Technology Network. Select the client versions that matches the version of the Oracle Database.
Extract both packages into the same directory.
Add the directory to
PATH
andLD_LIBRARY_PATH
. -
Create a local directory for TNS and from
TNS_ADMIN
on the Oracle Database host, copy thetnsnames.ora
andsqlnet.ora
file into this directory.If you create an Oracle Wallet for the Oracle Database host, then also copy over the Oracle Wallet file from the database server. Also check thatsqlnet.ora
points to the correct Oracle Wallet location.WALLET_LOCATION= (SOURCE=(METHOD=FILE)(METHOD_DATA= (DIRECTORY=<your wallet directory>))) sqlnet.wallet_override=true
-
As root, edit
/opt/oracle/ohsh/bin/ohsh_config.sh
. Set the following environment variables.export CP2HADOOP_HOME=/opt/oracle/orahivedp-<version export HS2_HOST_PORT=<HiveServer2 hostname>:<Port number>
Also set
HIVE_SESS_VAR_LIST
to any required Hive session variables. For example:-
To connect to HiveServer2 with Kerberos authentication:
export HIVE_SESS_VAR_LIST="principal=<The server principal of HiveServer2>"
-
To connect to HiveServer2 running in HTTP mode:
export HIVE_SESS_VAR_LIST="transportMode=http;httpPath=<The HTTP endpoint>"
-
To connect to HiveServer2 when SSL is enabled:
export HIVE_SESS_VAR_LIST="ssl=true;sslTrustStore=<Trust Store path>;trustStorePassword=<Trust Store password>"
-
-
If TNS and Oracle Wallet are enabled, also add the following variables to the file and export them:
export WALLET_LOCATION="<your wallet directory>" export TNS_ADMIN="<your TNS directory>"
-
Add
/opt/oracle/ohsh
and$HADOOP_HOME/bin
to the PATH variablePATH=/opt/oracle/ohsh-<version>/bin:$PATH PATH=$HADOOP_HOME/bin:$PATH
-
Clear the
HADOOP_CLASSPATH
environment variable before invoking Oracle Shell for Hadoop Loaders. -
Start an OHSH session and check for resources. This check indicates whether or not the configuration is correct.
$ ohsh ohsh> show resources
You should see the resources
hadoop0
,hive0
, andbash0
.
9.1.3 Getting Started Using Copy to Hadoop and OHSH
See $ORACLE_HOME/ohsh/doc
and $ORACLE_HOME/ohsh/examples
for examples that show how to run the OHSH load methods.
See the Oracle Big Data SQL User’s Guide more information on Copy to Hadoop and Oracle Shell for Hadoop Loaders.
9.1.4 Configuring OHSH to Enable Job Monitoring
When OHSH jobs are executed, status and other information about the job is recorded into a back-end database. To access information from the OHSH command line, you must first configure the connection to the database.
Configuration Steps
Configure the following properties in conf/smartloader-conf.xml
in order to enable a database instance where job history is stored.
-
oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.driver
Specifies the JDBC driver for the supported back-end database type. Currently,
MYSQL
andORACLE
are valid values. If this property is not specified, the job history commands fail.
Additional properties must be set. These differ, depending upon which database type is defined as the supported back-end database
-
If
jdc.driver
=ORACLE
:-
A TNS entry name defined in theoracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.oracleConnectId
tnsnames.ora
file. -
oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.oracleWalletDir
The OS directory containing the Oracle Wallet used to connect to an Oracle Database schema through JDBC.
-
oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.oracleTnsDir
The file path to a directory on the node where OHSH runs. This directory contains SQL*Net configuration files such as
sqlnet.ora
andtnsnames.ora
. Typically, this is${ORACLE_HOME}/network/admin
.Note:
If you are running OHSH from a Hadoop client and want to use Oracle Wallet, copytnsnames.ora
and the wallet files to any directory on the Hadoop client.
-
-
If
jdbc.driver
=MYSQL
:-
oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.mysqlConnectUrl
The URL used to make a JDBC connection to the MySQL database
-
oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.mysqlUser
MySQL user of job history schema
-
oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.mysqlPW
Password of the MySQL user account.
-
Commands for Monitoring OHSH Jobs
After this configuration is completed, you will be able to execute the following OHSH commands:
-
ohsh> show job <job_id>
Shows the detailed information of the job specified by ID.
-
ohsh> show job summary <job_id>
Shows the performance of the completed job specified by ID.
-
ohsh> show job abstract <job_id>
Provides a functional description of the job.
-
ohsh> show jobs [failed|running|completed|finished] [extended] [<integer>]
Shows the last
n
jobs of a particular job status.-
The first parameter specifies job status. If the status is not specified, all jobs are shown, regardless of job status.
-
The second parameter specifies whether to show details.
-
The third parameter specifies that the last
n
jobs of the specified status should be shown. Ifn
is not specified, then all jobs of that status are shown.
-
-
ohsh> truncate jobs [<integer>]
Removes the last
n
jobs from the database. If the integer is not specified, the command removes all jobs
9.2 Oracle SQL Access to Kafka
If you work with Apache Kafka clusters as well as Oracle Database, Oracle SQL Access to Kafka (OSAK) can give you access to Kafka brokers through Oracle SQL. You can then query data in the Kafka records and also join the Kafka data with data from Oracle Database tables.
OSAK is optional. The OSAK kit is copied to the Oracle Database system when you run
Oracle Big Data SQL's database-side installer, bds-database-install.sh
.
If you want to use OSAK, complete the setup as described in Installing and Configuring Oracle Access to
Kafka