Additional Tools Installed

9 Additional Tools Installed

9.1 Copy to Hadoop and OHSH

Copy to Hadoop

Copy to Hadoop makes it easy to identify and copy data from an Oracle Database table to the Apache Hadoop Distributed File System (HDFS) and create an Apache Hive external table over that data. In HDFS, the data can be saved in Oracle Data Pump files (which store data as Oracle data types). This provides an exact copy of the data in the database, without the potential for data type conversion errors. You can then query the data via Hive (and Hadoop technologies that access data through Hive) and by Oracle Big Data SQL. Oracle Data Pump data can be converted to Parquet or ORC using Hive

Oracle Shell for Hadoop Loaders (OHSH)

OHSH is an intuitive command line tool to move data between Hadoop and Oracle Database. It provides a set of declarative commands you can use to copy contents from an Oracle Database table to a Hive table.

OHSH is bundled with both Oracle Big Data SQL and Oracle Big Data Connectors. With Oracle Big Data SQL, OHSH works with Copy to Hadoop. With Oracle Big Data Connectors, OHSH provides an interface to copy contents from Hadoop and Hive to Oracle tables using Oracle Loader for Hadoop (OLH) and Oracle SQL Connector for Hadoop Distributed File System (OSCH).

Although OHSH can serve as a front end for OLH and OSCH as well as for Copy to Hadoop, OLH and OSCH are part of the Oracle Big Data Connectors product and are not installed by Oracle Big Data SQL.

Installation and Configuration of Copy to Hadoop and OHSH

Although Copy to Hadoop and OHSH can be run from a Hadoop cluster node, or an Oracle Database node, it is recommended to run Copy to Hadoop and OHSH from an edge node.

Oracle Big Data SQL installs the kits for Copy to Hadoop and Oracle Shell for Hadoop Loaders on the Hadoop cluster management server and on any Oracle Database server node where you install Oracle Big Data SQL.

You can also manually copy the kits for both tools from the Hadoop cluster management server to an edge node and configure them for use from the edge node. The kits are stored at /opt/oracle/orahivedp-<version>.. See /opt/oracle/ohsh-<version>/doc/OHSH_EDGE_NODE_README.txt on the Hadoop cluster management server for instructions.

Note:

In the Oracle Big Data SQL installation, the Copy to Hadoop is fully automated except for the setting of the CP2HADOOP_HOME listed in the table below. You must set this manually.

You must complete configuration of Oracle Shell for Hadoop Loaders manually, as described in this section.

Environment Variables for OHSH

On the Hadoop cluster management server, the OHSH configuration file is/opt/oracle/ohsh-<version>/bin/ohsh_config.sh. On an Oracle Database node, it is $ORACLE_HOME/ohsh/bin/ohsh_config.sh. This file contains environment required by OHSH. The set of variables needed for the installation on a Hadoop cluster, an Oracle Database Node or an edge node differ. The table below shows which variables you need to set for each environment. Some of the required variables are set for you automatically by the Oracle Big Data SQL installer.

9.1.1 Completing the OHSH Configuration on Oracle Database Nodes

The Oracle Big Data SQL installation automatically copies the OHSH installation kit to each node where Oracle Big Data SQL is installed. There are some manual steps you must perform to complete the installation.

Note:

The README file at $ORACLE_HOME/ohsh/doc/OHSH_DB_NODE_README.txt on the Hadoop cluster management server where you installed Oracle Big Data SQL describes the full set of steps ordinarily needed to enable OHSH on an Oracle Database node. However, when OHSH is installed by Oracle Big Data SQL, some of the setup is done for you by the installer. To complete the configuration in this context, use this shortened procedure instead of the full procedure documented in the README.

Log on to the Oracle Database server as the oracle user.
Set up OHSH to work with Copy to Hadoop.
1. Create a staging directory in the file system on the Oracle Database node:
```
$ mkdir <OS_STAGE_DIR>
```
2. Create an Oracle directory on this staging directory and grant read/write privileges to the Oracle Database user (oracle or other). In this example, ORACLE_STAGE_DIR refers to the name of the directory object in Oracle Database. OS_STAGE_DIR refers to the path on the server.
```
SQL> CREATE DIRECTORY <ORACLE_STAGE_DIR> AS '<OS_STAGE_DIR>';
SQL> GRANT READ,WRITE ON DIRECTORY <ORACLE_STAGE_DIR> TO <ORACLE_USER>;
```
  Note:
  Later, if you employ the “stage” method to copy Oracle tables, use the OHSH set locationdirectory command to point to the Oracle directory object name (the name of the directory you created above in this step).
3. If you are connecting to a CDH-based Hadoop system (Oracle Big Data Appliance or other), perform the substeps below to enable the OHSH “FUSE” method, in which you copy files to Hive tables over an HDFS Mountable FUSE directory. Note that HDP does not support Fuse, therefore you cannot create a mount point to HDFS on HDP using these steps.
  1. On the Oracle Database host, create a FUSE mount to the root of the HDFS system on the Hadoop cluster. Refer to Cloudera’s documentation on installing FUSE. You may find current instructions at Configuring Mountable HDFS on Cloudera’s website.
  2. Create the Oracle directory on the FUSE mount and grant privileges. In this example, ORACLE_FUSE_DIR refers to the name of the directory object in Oracle Database. OS_FUSE_MOUNT refers to the path to the mount point on the server.
```
      SQL> CREATE DIRECTORY <ORACLE_FUSE_DIR> AS '<OS_FUSE_MOUNT>';
      SQL> GRANT READ,WRITE ON DIRECTORY <ORACLE_FUSE_DIR> TO <ORACLE_USER>;
```
  Note that when you use OHSH later to copy Oracle tables to Hive using the "FUSE" method, use the OHSH set locationdirectory command to point to the same <ORACLE_FUSE_DIR>
Edit $ORACLE_HOME/ohsh/bin/ohsh_config.sh in order to configure the home directories of dependencies. When OHSH is installed by Oracle Big Data SQL, most of the required variables are already set for you. Set HS2_HOST_PORT and (if needed) HIVE_SESS_VAR_LIST as follows.
```
export HS2_HOST_PORT=<HS2_HOST>:<HS2_PORT>
```
Also set HIVE_SESS_VAR_LIST to any required Hive session variables. For example:
- To connect to HiveServer2 with Kerberos authentication: export HIVE_SESS_VAR_LIST="principal=<The server principal of HiveServer2>"
  
  The Hive principal is specified by the configuration property hive.server2.authentication.kerberos.principal in hive-site.xml.
- To connect to HiveServer2 running in HTTP mode: export HIVE_SESS_VAR_LIST="transportMode=http;httpPath=<The HTTP endpoint>"
- To connect to HiveServer2 when SSL is enabled: export HIVE_SESS_VAR_LIST="ssl=true;sslTrustStore=<Trust Store path>;trustStorePassword=<Trust Store password>"
If you want to configure OHSH to use Oracle Wallet, set environment the variables TNS_ADMIN and WALLET_LOCATION to the directory path where TNS and Oracle Wallet are stored.
```
export WALLET_LOCATION="<wallet_location>"
export TNS_ADMIN="<tns_admin_location>"
```
Note:

Later, when you start an OHSH session, if the TNS_ADMIN and WALLET_LOCATION are different on the Hadoop cluster then set hadooptnsadmin and hadoopwalletlocation:
```
set hadooptnsadmin <tns_admin_hadoop_node>
set hadoopwalletlocation <wallet_location_hadoop_node>
```
Clear the HADOOP_CLASSPATH environment variable before invoking Oracle Shell for Hadoop Loaders.
Start an OHSH session and check for resources. This check indicates whether or not the configuration is correct.
```
$ ohsh
ohsh> show resources
```
You should see the resources hadoop0, hive0, and bash0.

9.1.2 Completing the OHSH Configuration on the Hadoop Cluster

On the Hadoop cluster management server, Oracle Big Data SQL installs OHSH from an RPM to /opt/oracle/ohsh. Perform the steps below as root in order to complete the Hadoop side of the configuration.

Note:

The installation instructions at $ORACLE_HOME/ohsh/doc/OHSH_HADOOP_NODE_README.txt on the Hadoop cluster management server contain extra information that is not applicable to the installation of Copy to Hadoop and OHSH in the context of the Oracle Big Data SQL installation. Use the instructions below instead.

If the Hadoop system is not an Oracle Big Data Appliance, then download the SQL*Plus Instant Client and Oracle Instant Client for Linux from the Oracle Technology Network. Select the client versions that matches the version of the Oracle Database.

Extract both packages into the same directory.

Add the directory to PATH and LD_LIBRARY_PATH.
Create a local directory for TNS and from TNS_ADMIN on the Oracle Database host, copy the tnsnames.ora and sqlnet.ora file into this directory.
If you create an Oracle Wallet for the Oracle Database host, then also copy over the Oracle Wallet file from the database server. Also check that sqlnet.ora points to the correct Oracle Wallet location.
```
WALLET_LOCATION= 
      (SOURCE=(METHOD=FILE)(METHOD_DATA= 
        (DIRECTORY=<your wallet directory>))) 
   sqlnet.wallet_override=true
```
As root, edit /opt/oracle/ohsh/bin/ohsh_config.sh. Set the following environment variables.
```
export CP2HADOOP_HOME=/opt/oracle/orahivedp-<version 
export HS2_HOST_PORT=<HiveServer2 hostname>:<Port number>
```
Also set HIVE_SESS_VAR_LIST to any required Hive session variables. For example:
- To connect to HiveServer2 with Kerberos authentication: export HIVE_SESS_VAR_LIST="principal=<The server principal of HiveServer2>"
- To connect to HiveServer2 running in HTTP mode: export HIVE_SESS_VAR_LIST="transportMode=http;httpPath=<The HTTP endpoint>"
- To connect to HiveServer2 when SSL is enabled: export HIVE_SESS_VAR_LIST="ssl=true;sslTrustStore=<Trust Store path>;trustStorePassword=<Trust Store password>"
If TNS and Oracle Wallet are enabled, also add the following variables to the file and export them:
```
export WALLET_LOCATION="<your wallet directory>"
export TNS_ADMIN="<your TNS directory>"
```

Add /opt/oracle/ohsh and $HADOOP_HOME/bin to the PATH variable

PATH=/opt/oracle/ohsh-<version>/bin:$PATH
PATH=$HADOOP_HOME/bin:$PATH

Clear the HADOOP_CLASSPATH environment variable before invoking Oracle Shell for Hadoop Loaders.
Start an OHSH session and check for resources. This check indicates whether or not the configuration is correct.
```
$ ohsh
ohsh> show resources
```
You should see the resources hadoop0, hive0, and bash0.

9.1.3 Getting Started Using Copy to Hadoop and OHSH

See $ORACLE_HOME/ohsh/doc and $ORACLE_HOME/ohsh/examples for examples that show how to run the OHSH load methods.

See the Oracle Big Data SQL User’s Guide more information on Copy to Hadoop and Oracle Shell for Hadoop Loaders.

9.1.4 Configuring OHSH to Enable Job Monitoring

When OHSH jobs are executed, status and other information about the job is recorded into a back-end database. To access information from the OHSH command line, you must first configure the connection to the database.

Configuration Steps

Configure the following properties in conf/smartloader-conf.xml in order to enable a database instance where job history is stored.

oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.driver

Specifies the JDBC driver for the supported back-end database type. Currently, MYSQL and ORACLE are valid values. If this property is not specified, the job history commands fail.

Additional properties must be set. These differ, depending upon which database type is defined as the supported back-end database

If jdc.driver = ORACLE:
- oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.oracleConnectId
  A TNS entry name defined in the tnsnames.ora file.
- oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.oracleWalletDir
  
  The OS directory containing the Oracle Wallet used to connect to an Oracle Database schema through JDBC.
- oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.oracleTnsDir
  
  The file path to a directory on the node where OHSH runs. This directory contains SQL*Net configuration files such as sqlnet.ora and tnsnames.ora. Typically, this is ${ORACLE_HOME}/network/admin.
  
  Note:
  If you are running OHSH from a Hadoop client and want to use Oracle Wallet, copy tnsnames.ora and the wallet files to any directory on the Hadoop client.
If jdbc.driver = MYSQL:
- oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.mysqlConnectUrl
  
  The URL used to make a JDBC connection to the MySQL database
- oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.mysqlUser
  
  MySQL user of job history schema
- oracle.hadoop.smartloader.diagnostics.jobhistory.jdbc.mysqlPW
  
  Password of the MySQL user account.

Commands for Monitoring OHSH Jobs

After this configuration is completed, you will be able to execute the following OHSH commands:

ohsh> show job <job_id>

Shows the detailed information of the job specified by ID.
ohsh> show job summary <job_id>

Shows the performance of the completed job specified by ID.
ohsh> show job abstract <job_id>

Provides a functional description of the job.
ohsh> show jobs [failed|running|completed|finished] [extended] [<integer>]

Shows the last n jobs of a particular job status.
- The first parameter specifies job status. If the status is not specified, all jobs are shown, regardless of job status.
- The second parameter specifies whether to show details.
- The third parameter specifies that the last n jobs of the specified status should be shown. If n is not specified, then all jobs of that status are shown.
ohsh> truncate jobs [<integer>]

Removes the last n jobs from the database. If the integer is not specified, the command removes all jobs

9.2 Oracle SQL Access to Kafka

If you work with Apache Kafka clusters as well as Oracle Database, Oracle SQL Access to Kafka (OSAK) can give you access to Kafka brokers through Oracle SQL. You can then query data in the Kafka records and also join the Kafka data with data from Oracle Database tables.

OSAK is optional. The OSAK kit is copied to the Oracle Database system when you run Oracle Big Data SQL's database-side installer, bds-database-install.sh. If you want to use OSAK, complete the setup as described in Installing and Configuring Oracle Access to Kafka