2 Installing or Upgrading the Hadoop Side of Oracle Big Data SQL
After downloading the Oracle Big Data SQL deployment bundle and extracting the files, the next step is to configure the installer and then run the installation.
The installation of Oracle Big Data SQL on the Hadoop cluster is deployed using the services provided by the cluster management server (Cloudera Manager or Ambari). The Jaguar install
command uses the management server API to register the BDS service and start the deployment task. From there, the management server controls the process that deploys the software to the nodes of the cluster and installs it.
The Hadoop-side installation also generates the deployment bundle for the database side of the installation.
If a previous version of Oracle Big Data SQL is already installed, Oracle Big Data SQL upgrades the installation to Release 4.1.2.
Users of previous Oracle Big Data SQL releases, please note that there are changes to the BDSJaguar configuration parameters available in this release.
2.1 About Support for Multiple Database Versions (19c, 18c, 12.2, and 12.1)
Oracle Big Data SQL now supports Oracle Database 19c and also provides backward compatibility for Oracle Database 18c, 12.2, and 12.1.
You can use Oracle Big Data SQL 4.1.2 with any Oracle Database from release 12.1 to 19c. The database-related feature set available to you in Big Data SQL is determined by the Oracle Database version where it is installed. Each release of the database provides some advantages for Big Data SQL that its predecessors do not.
- Oracle Database 19c (first supported in Big Data SQL 4.1) provides the
ability to create hybrid partitioned tables that can include data in CSV or Parquet
files, and other formats accessible to tools in Spark, Hadoop, and other big data
technologies. See Hybrid Partitioned Tables in
the Oracle Database VLDB and Partitioning Guide.
Another Oracle Database 19c new feature that is useful to Big Data SQL is In-Memory External Tables.
In addition, an installation of Big Data SQL on an 19c database system has all of the functionality available to Big Data SQL on 18c databases.
- With Oracle Database 18c (which is supported by Oracle Big Data SQL 4.0 and later), you can access object stores in the cloud through the ORACLE_BIGDATA driver. 18c also enables Big Data SQL to perform aggregation offload, in which processing of aggregations in queries against data in Hadoop is pushed down to the Hadoop cluster.
- Oracle Database 12.1 and 12.2 are fully supported in this release. However, Big Data SQL installations on these databases do not enable you to leverage the newer capabilities that are available with 18c and 19c. With 12.1 and 12.2, Big Data SQL functionality is equivalent to Big Data SQL 3.2.1.1.
This backward compatibility enables you to install and administer release 4.1.2 in a mixed environment that includes both Oracle Database 19c, 18c, and 12c.
See Also:
The Jaguar Configuration Parameter and Command Reference in this chapter shows you how to configure support for Oracle Database versions when you install the Hadoop side of Big Data SQL.2.2 Before You Start the Hadoop-Side Installation
Check to ensure all DataNodes of the cluster meet prerequisites.
2.2.1 Check Hadoop-Side Prerequisites
You can run bds_node_check.sh
on all cluster DataNodes prior to installing Oracle Big Data SQL. This is a quick way to check if each node meets the installation criteria. You can see exactly what needs to be updated.
bds_node_check.sh
is not required, but is recommended. The Jaguar installer runs the same pre-checks internally, but when Jaguar runs the pre-checks it also starts and stops the cluster management server. Furthermore, the installation stops in place when it encounters a node that does not meet the prerequisites. Each time this happens, you then need to fix the readiness errors on the node in order to continue. Running bds_node_check.sh
as a first step contributes to a smoother installation.
You can use this same script to check for the prerequisites when you add new nodes to the cluster.
Deploying and Running bds_node_check.sh
The script checks the local node where it is run. It does not check all nodes in the cluster.
- Find the script on the cluster management server in the install directory created
when you executed
./BDSJaguar-4.1.2.run
.$ ls <Big Data SQL Install Directory> BDSJaguar bds_node_check.sh $ cd <Big Data SQL Install Directory>
- Use your preferred method to copy the script to a node that you want to check.
$ scp bds_node_check.sh oracle@<node_IP_address>:/opt/tmp
- Log on to the node and run the script.
$ ./bds_node_check.sh
Checking for Missing Prerequisites in the bds_node_check.sh Output
The report returned by bds_node_check.sh
inspects the node both for Jaguar installer prerequisites and prerequisites for support of communications with Query Server on its edge node. If you do not intend to install Query Server, you can ignore that subset of the prerequisites.
bds_node_check.sh: BDS version 4.1.2 (c) 2020 Oracle Corporation
bds_node_check.sh:
bds_node_check.sh: Starting pre-requirements checks for BDS Jaguar
bds_node_check.sh: Total memory 64240 >= 40960 correct
bds_node_check.sh: vm_overcommit_memory=0 correct
bds_node_check.sh: shmmax=4398046511104, shmall=1073741824, PAGE_SIZE=4096 correct
bds_node_check.sh: shmmax=4398046511104 >= total_memory=67360522240 + 1024 correct
bds_node_check.sh: swappiness=10 correct
bds_node_check.sh: Total cores 32 >= 8 correct
bds_node_check.sh: Size of socket buffer rmem_default 4194304 >= 4194304 correct
bds_node_check.sh: Size of socket buffer rmem_max 8388608 >= 4194304 correct
bds_node_check.sh: Size of socket buffer wmem_default 4194304 >= 4194304 correct
bds_node_check.sh: Size of socket buffer wmem_max 8388608 >= 4194304 correct
bds_node_check.sh: dmidecode installed
bds_node_check.sh: net-snmp installed
bds_node_check.sh: net-snmp-utils installed
bds_node_check.sh: perl-XML-SAX installed
bds_node_check.sh: perl-XML-LibXML installed
bds_node_check.sh: perl-libwww-perl installed
bds_node_check.sh: perl-libxml-perl installed
bds_node_check.sh: libaio installed
bds_node_check.sh: glibc installed
bds_node_check.sh: libgcc installed
bds_node_check.sh: libstdc++ installed
bds_node_check.sh: libuuid installed
bds_node_check.sh: perl-Time-HiRes installed
bds_node_check.sh: perl-libs installed
bds_node_check.sh: perl-Env installed
bds_node_check.sh: libcgroup-tools installed
bds_node_check.sh: rpm found
bds_node_check.sh: scp found
bds_node_check.sh: curl found
bds_node_check.sh: unzip found
bds_node_check.sh: zip found
bds_node_check.sh: tar found
bds_node_check.sh: uname found
bds_node_check.sh: perl found
bds_node_check.sh: cgget found
bds_node_check.sh:
bds_node_check.sh: Optionally, if this node will be running the Jaguar installer,
bds_node_check.sh: it must have at least python version 2.7.5
bds_node_check.sh: with cryptography module available
bds_node_check.sh: Testing with /usr/bin/python
bds_node_check.sh: Python version 2.7.5, correct
bds_node_check.sh: Python cryptography module available, correct
bds_node_check.sh:
bds_node_check.sh: All pre-requirements were met for BDS Jaguar
bds_node_check.sh:
bds_node_check.sh: Starting pre-requirements checks for BDS Query Server
bds_node_check.sh: Open files 131072 >= 131072 correct
bds_node_check.sh: expect installed
bds_node_check.sh: procmail not installed
bds_node_check.sh: oracle-database-preinstall-19c not installed
bds_node_check.sh: rpm found
bds_node_check.sh: scp found
bds_node_check.sh: curl found
bds_node_check.sh: unzip found
bds_node_check.sh: zip found
bds_node_check.sh: tar found
bds_node_check.sh: uname found
bds_node_check.sh: perl found
bds_node_check.sh: cgget found
bds_node_check.sh: No database instances running on this node, correct
bds_node_check.sh: /etc/oracle/olr.loc file does not exist
bds_node_check.sh: /etc/oracle/ocr.loc file does not exist
bds_node_check.sh:
bds_node_check.sh: 2 error(s) found for BDS Query Server pre-requirements
2.2.2 Check Memory Requirements
Support for Oracle Database 12.1 may require additional memory.
Oracle Big Data SQL provides backward compatibility with Oracle Database 12.1 and 12.2 . However compatibility with Oracle Database 12.1 incurs an additional cost in memory. Oracle Database 12.1 support requires that the Hadoop nodes run an older offload server (in addtion to the offload server normally present). The overhead of running this additional offload server is a resource expense that you can prevent if you do not need to support Oracle Database 12.1.
If You Need to Support Oracle Database 12.1 for this Cluster:
Be sure that the DataNodes in the Hadoop cluster have enough memory. The minimum memory requirement per Hadoop node for an installation that supports full database compatibility, including 19c, 18c, 12.2, 12.1 is 64 GB.
Also check to be sure that the memory cgroup upper limit is set to allow Oracle Big Data SQL to consume this much memory.
If You do not Need to Support Oracle Database 12.1 for this Cluster:
Be sure to choose the right setting for the database_compatibility
value in the Jaguar configuration file
(bds-config.json
or other). The
options for this parameter are: "12.1"
,
"12.2"
, "18"
,
"19"
, and "full"
.
It's important to note that both "12.1"
and
"full"
trigger the startup of the
additional offload server to support 12.1.
See Also:
The Jaguar Configuration Parameter and Command Reference describes thedatabase_compatibility
parameter.
2.2.3 Plan the Configuration and Edit the Jaguar Configuration File
Before you start, consider the questions below.
Answering these questions will help clarify how you should edit the Jaguar configuration file. (See the Jaguar Configuration Parameter and Command Reference in this chapter.)
- Do you plan to access data in object stores (S3, Azure, or Oracle Object
Store)?
If so, then in the Jaguar configuration file you need to enable this access and also define some proxy settings that are specific to object store access.
- Do you want to install the optional Query Server?
If so, several things are required before you run Jaguar in order to install Oracle Big Data SQL:
- Identify a cluster edge node to host the optional Query Server.
A dedicated node is strongly recommended. Query Server is resource-intensive and cannot run on node hosting either the DataNode or BDSSERVER roles.
- Download and unzip the Query Server bundle and then execute the run file.
The Query Server is in a separate deployment bundle from the Jaguar installer (
BDSExtra-4.1.2-QueryServer.zip)
. Before running Jaguar, download and unzip this bundle from https://edelivery.oracle.com/. Then execute the Query Server run file. Also, in the configuration file submitted to Jaguar, define the two related parameters in the edgedb section of the file --node
andenabled
."edgedb": { "node" : "<some server>.<some domain>.com", "enabled" : "true" }
- Identify a cluster edge node to host the optional Query Server.
-
Does your Hadoop environment need to connect to more than one Oracle Database
version?By default, Big Data SQL allows connections from 12.1, 12.2, 18c, and 19c databases. However, the offloaders that support these connections are resource-intensive, particularly for memory consumption on the nodes of the cluster. If you do not need to support all three releases, you can save resource by turning off either the 12.1 or 12.2 offloader. (Note that the 12.2 offloader actually supports 12.2, 18c, and 19c.) You set this in the configuration passed to the Jaguar installer. For example, you can enter this string to allow connections from 12.1 databases only:
"database_compatibility" : [ "12.1" ]
If you specify "12.2", "18c", or "19c", the 12.1 offload is not enabled:"database_compatibility" : [ "12.2" ]
-
Do you want to enable Database Authentication in order to validate connection requests from Oracle Database to Big Data SQL server processes on the Hadoop DataNodes?
Database Authentication in the network connection between the Oracle Database and Hadoop is set to
“true”
in the configuration by default. You have the option to disable it by setting thedatabase_auth_enabled
to“false”
:"database_auth_enabled" : "false",
-
Do you want to use the Multi-User Authorization feature?
Multi-User Authorization enables you to grant users other than
oracle
permissions to run SQL queries against the Hadoop cluster. Multi-User Authorization can be used in conjunction with Sentry's role-based access control to provide improved control over user access.The first step in setting up Multi-User Authorization is to set these parameters in the security section of the configuration file:
Note that you can add any account to the blacklist."impersonation_enabled" : "true", "impersonation_usehosts" : "true", "impersonation_blacklist" : "hdfs,hive"
-
Are the Hadoop cluster and the Oracle Database system going to communicate over Ethernet or over InfiniBand? Also, do the Hadoop nodes have more than one network interface?
See theuse_infiniband
andselection_subnet
parameters. (Theselection_subnet
does not apply to Oracle Big Data Appliance.)
By default"use_infiniband" : "false", "selection_subnet" : "5.32.128.10/21"
use_infiniband
is set to false. Ethernet is the default protocol. -
Are you going to pre-download the Hadoop and Hive client tarballs and set up a local repository or directory where the installer can acquire them, or, will you allow Jaguar to download them directly from the Cloudera or Hortonworks repository on the Internet (the default behavior)?
For Cloudera releases prior to 6.0, you can use the
url
ordir
parameters inBDS-config.json
(the Jaguar installer's configuraton file) to specify an arbitrary download location. If Internet access is via proxies, you can also set thehttp_proxy
andhttps_proxy
parameters inBDS-config.json
.Note:
On Big Data Appliance only, if you use the built-in Mammoth or bdacli utilities to install Big Data SQL the clients are automatically installed for you. However, if you use the Jaguar to install Big Data SQL on Big Data Appliance as well as other supported Hadoop platforms, you do have to provide the path inBDS-config.json
if the location is other than the public repository.For Big Data SQL on Cloudera 6.x, the default is also automatic download of the clients from the public repository. However, in these environments you cannot specify a different repository in the Jaguar configuration file. Instead, the CLI for installer on the database side provides the
--alternate-repo
parameter. Use this parameter to pass the client download location to the installer. See--alternate-repo
in the Command Line Parameter Reference for bds-database-install.sh -
If the network is Kerberos-secured, do you want the installer to set up automatic Kerberos ticket renewal for the Kerberos principal on the Hadoop side and the Oracle Database side?
See the parameters in thekerberos
section:"principal" : "<oracle or other>/mycluster@MY.<DOMAIN>.COM", "keytab" : "/home/oracle/security/oracle.keytab", "hdfs-principal" : "hdfs/mycluster@MY.<DOMAIN>.COM", "hdfs-keytab" : "/home/hdfs/security/hdfs.keytab"
The Kerberos principal and keytab identified here are used on the Hadoop side. They also copied into the database-side installation bundle. You can either use the same principal or a different principal on the database side. See
--alternate-principal
in Command Line Parameter Reference for bds-database-install.sh. -
Do you want the Oracle Big Data SQL install process to automatically restart services that are in a stale state?
By default, stale services are restarted automatically. If you want to suppress this, you can set the
restart_stale
parameter in the configuration file to“false”
. -
Is the Hadoop cluster using the default REST API port for CDH or Ambari?
If not, set the
ports
parameter. - Are the HDFS or Hive daemons in the Hadoop cluster owned by non-default
groups and/or users?
By default, HDFS daemons are owned by the
hdfs
user in thehdfs
group. Hive daemons are by default owned by thehive
user in thehive
group. If these defaults have been changed, use the parameters in thehadooop_ids section
of the configuration file to identify the current groups and users for these daemons:hdfs_user
,hdfs_group
,hive_user
,hive_group
.
Note:
Setting these parameters in the configuration file does not complete the set up for some features. For example, to enable Database Authentication, you must also pass a special -—requestdb
parameter to the Jaguar utility in order to identify the target database or databases. There are also steps required to generate and install the security key used by this feature. To enable Multi-User Authorization, you start by setting the Hadoop Impersonation parameters in the configuration file, but also need to set up the authorization rules. The steps to complete these setups are provided where needed as you work through the instructions in this guide.
2.3 About the Jaguar Utility
Jaguar is a multifunction command line utility that you use to perform all Oracle Big Data SQL operations on the Hadoop cluster.
Jaguar currently supports these operations:
-
install
-
Deploys Oracle Big Data SQL binaries to each cluster node that is provisioned with the DataNode service.
-
Configures Linux and network settings for
bd_cell
(the Oracle Big Data SQL service) on each of these nodes. -
Generates the bundle that installs Oracle Big Data SQL on the Oracle Database side. It uses the parameter values that you set in the configuration file in order to configure the Oracle Database connection to the cluster.
-
-
reconfigure
Modifies the current configuration of the installation (according to the settings in the configuration file provided).
-
databasereq
Generates a request key file that contains one segment of the GUID-key pair used in Database Authentication. (The
databasereq
operation performs this function only. Forinstall
andreconfigure
, request key generation is an option that can be included as part of the larger operation.) -
databaseack
Perform the last step in Database Authentication setup -- install the GUID-key pair on all Hadoop DataNodes in a cluster in order to allow queries from the Oracle Database that provided it.
sync_principals
Gets a list of principals from a KDC running on a cluster node and use it to create externally-identified database users for Query Server.
-
uninstall
Uninstalls Oracle Big Data SQL from all DataNodes of the Hadoop cluster.
See Also:
Jaguar Operations in the next section provides details and examples.2.3.1 Jaguar Configuration Parameter and Command Reference
This section describes the parameters within the Jaguar configuration file as well as Jaguar command line parameters.
Configuration Parameters
The table below describes all parameters available for use in bds-config.json
or your own configuration file. Only the cluster name
parameter is always required. Others are required under certain conditions stated in the description.
Note:
When editing the configuration file, be sure to maintain the JSON format. Square brackets are required around lists, even in the case of a list with a single item.Table 2-1 Configuration Parameters in bds-config.json (or in Customer-Created Configuration Files)
Section | Parameter | Type | Description |
---|---|---|---|
cluster |
name |
String |
The name of the cluster. For CDH clusters (Oracle Big Data Appliance or other), this name can be either the physical cluster name or the display name. The installer searches first by physical name and then by display name. The |
cluster |
database_compatibility | string | Select which Oracle Database versions must be supported.
Possible
values: For example, either of these settings enables support for Oracle Database 12.1, 12.2, 18c, and 19c.
Either of the following settings enables support for Oracle Database 12.2, and 18c, but disable support for Oracle Database 12.1. By disabling support for Oracle Database 12.1 if it is not needed, you conserve some system resources, particularly memory.
The default is
|
api |
hostname |
String | Visible hostname for the cluster management server. In some scenarios, the visible hostname for Cloudera Manager or Ambari is not the same to the current hostname, for example, in High Availability environments.
Default: the local hostname. |
api |
skip_health_check |
Boolean | If "true" , the cluster health check is skipped.
The cluster health check verifies that HDFS, Hive and Yarn services are running with good health and are not stale. Additionally, for CDH clusters, management services should be running with good health and not stale. Default: |
api |
port |
Integer |
Cloudera Manager or Ambari REST API port. By default, on CDH clusters this port is 7183 for secured and 7180 for unsecured access. For Ambari, is 8443 for secured and 8080 for unsecured. Optional. |
api |
restart_stale |
Boolean |
If If Optional. The default is |
edgedb |
enabled |
Boolean | Determines whether or not the Query Server functionality is enabled or not.
Default: |
edgedb |
node |
String |
Hostname of the node where the Query Server database will be running (if enabled). Note: Because Query Server is resource-intensive, it is highly recommended that you install the database on a dedicated node. Query server cannot run on a node that is running the DataNode role, nor the BDSSERVER role. |
object_store_support |
enabled |
boolean | If "true" , Oracle Wallet is set up both in the cluster and on the database system in order to allow access to Object Store.
Default: |
object_store_support |
cell_http_proxy |
string |
If object store access support is enabled, this parameter is required for access
to an object store from the Hadoop cluster side, even for empty
values. Follows same rules as the Linux http_proxy variable. For
example: http://myproxy.<domain>.com:80 .
No default value.
|
object_store_support |
cell_no_proxy |
string |
Like cell_http_proxy , supports access to object stores and is
also required if this access is enabled, even for empty values.
Follows same syntax rules as the Linux no_proxy environment
variable. For example:
localhost,127.0.0.1,.<domain>.com .
No default value.
|
object_store_support |
database_http_proxy |
string |
Same description as cell_http_proxy , except that this parameter supports object store access from the database side, not the Hadoop side.
|
object_store_support |
database_no_proxy |
string |
Same description as cell_no_proxy , except that this parameter supports object store access from the database side, not the Hadoop side.
|
network |
|
String |
Specify the proxy settings to enable download of the Hadoop client tarballs and cluster settings files. If both of these strings are empty, the OS environment proxy settings are used. By default, both strings are empty. Using these two parameters in the configuration file is optional. If they are needed, you could instead set them externally as in Not applicable to Oracle Big Data Appliance |
network |
extra_nodes |
List |
List additional nodes where the BDSAgent should be installed The BDSAgent and BDSServer roles are installed on all DataNodes instances. In addition, BDSAgent is installed on cluster nodes running HiveServer2 and HiveMetaStore instances. All remaining nodes are automatically excluded unless you add them here. Default: empty |
network |
excluded_nodes |
List | Nodes that are not hosting the DataNode role can be excluded by listing them within this parameter. |
security |
impersonation_enabled |
Boolean |
If Default value: Note: For CDH clusters, if the Sentry service is running, this setting is overidden and impersonation is enabled regardless of the value of this parameter. |
kerberos |
principal |
String |
The fully-qualified Kerberos principal name for a
user. Before 4.1 the principal had to be The principal has three parts:
The Oracle Big Data SQL installation uses the
Kerberos
Required for secured clusters. Note: Later, when you perform the database-side
installation of Big Data SQL, review the description of the
|
kerberos |
db-service-principal |
String |
Specifies a principal on the KDC server for use by Query Server (and only Query Server). It is not used for authentication against an external Oracle Database. Both The qualifier for the principal name must match the fully qualified domain name of the node where the Query Server will be running. Required for secured clusters. |
kerberos |
db-service-keytab |
String |
Fully-qualified location of the keytab file for the principal specificed with Be sure to store the keytab in a location that is accessible to the Jaguar installer. |
kerberos |
sync_principals |
Boolean |
The sync_principals parameter specifies whether or not Jaguar automatically gets a list of principals from a KDC running on a cluster node and then uses the list to create externally-identified database users for Query Server. If set to true, then an automatic synchronization with Kerberos principals occurs during Jaguar install and reconfigure operations. The user can also call this synchronization at any time by invoking the sync_principals operation of Jaguar on the command line. Default: |
kerberos |
hdfs-keytab |
String |
Fully-qualified path to the principal keytab file. A keytab file is created for each principal on the KDC server. It must exist in a location accessible to the Jaguar installer. Required for secured clusters. |
kerberos |
keytab |
String |
Fully-qualified location for the principal’s keytab file name. Copy the keytab file to a location accessible to the Jaguar installer and set the path as the value of this parameter. |
kerberos |
hdfs-principal |
String |
Fully-qualified Kerberos principal name for the " The Required for secured clusters. |
repositories |
dir |
List |
List of directories where the Hadoop clients for deployment on the database side are located. These directories can be on the local file system or on NFS. Directories are searched in the order listed. By default, the list is empty. If the Optional. Not applicable to Oracle Big Data Appliance, which already includes the required clients. Important: The |
repositories |
url |
List |
This is the list of URLs where the Hadoop client tarballs for deployment on the database side are located. If you data center already has repositories set up for access via HTTP, then you may prefer to maintain the Hadoop tarballs in that repository and use URL parameter for Oracle Big Data SQL installations. The URLs can be to the localhost, an internal network, or a site on the Internet (if the node has Internet access). The URLs are tried in the order listed. Note that internal proxy values and/or OS environment proxy settings must be set to allow this access if needed. If access to all listed repositories fails and/or Internet access is blocked, the database installation bundle is not created and a warning message is displayed. After correcting any problems and providing access to a repository, you can re-run the installer using the Not applicable to Big Data Appliance, where the tarballs are stored in a local repository in the cluster and the location is automatically added to the configuration file. |
network |
use_infiniband |
Boolean |
If Used for Oracle Big Data Appliance clusters only. Default value: |
network |
selection_subnet |
String |
If Hadoop cluster nodes have several network interfaces, you can use If the Hadoop cluster nodes have only one network interface, this parameter is ignored. The default value depends upon these conditions:
Note for Oracle Big Data Appliance Users: It's possible to configure several networks on an Oracle Big Data Appliance. If multiple networks exist, then this parameter must be set in order to select a specific network. |
security | database_auth_enabled |
Boolean |
If If Default value: |
security | impersonation_blacklist |
String |
The Hadoop proxy users blacklisted for impersonation. This parameter is used only if Hadoop impersonation is enabled. Since this is a required setting on the Oracle Database side, it is provided with a default value of |
security | impersonation_usehosts |
Boolean |
If If Default value: |
memory | min_hard_limit |
Integer |
The minimum amount of memory reserved for Big Data SQL, in megabytes. This parameter is used on CDH clusters (Oracle Big Data Appliance and others). It is not used on HDP clusters. By default, the value is 32768 MB (32 GB) . If you set the
|
memory | max_percentage |
Integer |
On CDH clusters (Oracle Big Data Appliance and others) this parameter specifies the percentage of memory on each node to reserve for Big Data SQL. This percentage is considered from a total amount of: NodeManager if YARN ResourceManager is enabled for that node. Physical memory if not. If the YARN Resource Manager is enabled for the node, then percentage should be based on the total amount of memory used by the NodeManager. Otherwise it should be a percentage of physical memory. This parameter is ignored on HDP clusters. |
hadoop_ids | hdfs_user |
String | The operating system user that runs the Hadoop HDFS
daemons.
Default value:
Note: By default, Jaguar assumes Hive and HDFS usernames and groups. But if you used different names in your Hadoop installation, then use the hadoop_ids parameters to identify them. |
hadoop_ids | hdfs_group |
String | The operating system group that runs the Hadoop HDFS
daemons.
Default value:
|
hadoop_ids | hive_user |
String | The operating system user that runs the Hadoop Hive
daemons.
Default value: |
hadoop_ids | hive_group |
String | The operating system group that runs the Hadoop Hive
daemons.
Default value: |
Note:
After Oracle Big Data SQL is installed on the Hadoop cluster management server, you can find configuration file examples that demonstrate various parameter combinations in the<Big Data SQL Install directory>/BDSjaguar
directory:example-bda-config.json
example-cdh-config.json
example-kerberos-config.json
example-localrepos-config.json
example-subnetwork-config.json
example-unsecure-config.json
You can see all possible parameter options in use in example-cdh-config.json
.
See Also:
See the Appendix Downloading the Correct Versions of the Hadoop, Hive, and HBase Clients for a Local Repostory for suggestions that can help with the setup of client tarball downloads.Jaguar Operations
The table below lists the full set of operations performed by the Jaguar utility on the Hadoop side of the Oracle Big Data SQL installation.
The general syntax for Jaguar commands is as follows. The --requestdb
parameter does not apply to all Jaguar commands.
# ./jaguar {--requestdb <comma-separated database names> | NULL } <action> { bds-config.json | <myfilename>.json | NULL }
Examples:
# ./jaguar install
# ./jaguar install bds-config.json
# ./jaguar install mycustomconfig.json
# ./jaguar --requestdb orcl,testdb,proddb install
# ./jaguar --requestdb orcl install
# ./jaguar sync_principals
You can use the default bds-config.json
or your own configuration file, or omit the configuration file argument (which defaults to bds-config.json
).
About --requestdb:
The --requestdb
parameter is required for the
databasereq
command, optional for install
,
and reconfigure
, and non-applicable for other Jaguar commands.
The parameter must be passed in to one of these operations in order to enable
Database Authentication in the connection between a Hadoop cluster and a
database. Unless you prefer to disable Database Authentication, it is
recommended that you include --requestdb
with the initial
install
operation. Otherwise, you will need perform an
additional step later in order to generate the request key.
This parameter is functional only when Database Authentication (database_auth_enabled
) is set to “true
” in the configuration. (This setting is a configuration default and does not need to be explicitly set in the configuration file.)
Jaguar needs the database names in order to generate a unique
.reqkey
(request key) file for each database. When
database_auth_enabled
is set “true
” at
installation time, the --requestdb
parameter is still optional.
Post-installation you have the same option to send the request key using the
reconfigure
, and databasereq
operations.
Database Authentication is not implemented until you do all of the
following:
-
Ensure that
database_auth_enabled
is either absent from the configuration file or is set to ““true”
. (It is“true”
by default.) -
Include
--requestdb
in a Jaguar command:-
Run the Jaguar
install
orreconfigure
and install the updated database-side installation bundle. -
Run Jaguar
databasereq
to generate an acknowledge key from the existing database side installation.
-
-
Copy the generated ZIP file that contains the .ackkey file from the database-side installation directory to
/opt/oracle/DM/databases/conf
on the Hadoop cluster management server. -
Run the Jaguar
databaseack
command as described in the table below.
The table below shows the available Jaguar commands.
Table 2-2 Jaguar Operations
Jaguar Operation | Supports --requestdb? | Usage and Examples |
---|---|---|
install The --requestdb <comma-separated database list> |
Y |
Installs Oracle Big Data SQL on the Hadoop cluster identified in the configuration file and creates an installation bundle for the database side based on the parameters included in the configuration file (or default values for parameters not explicitly assigned value in the configuration file). Examples:
No configuration file parameter is included in the above example. Note: You may need to use the
On Big Data Appliance clusters running Oracle Linux 6 and Oracle Linux 7, scl is not needed in order call the correct Python version for Jaguar. |
reconfigure |
Y |
Modify the current installation by applying changes you have made to the configuration file (
Note that if you run The
|
databasereq |
Y |
Use this command to create the
|
databaseack |
N |
The “Database Acknowledge” process provides confirmation to the Oracle Big Data SQL installation on the Hadoop cluster that security features you enabled in the configuration file have been successfully implemented in the database-side installation. It then completes implementation of the selected security features on the Hadoop cluster side.
Only run
If a database-side installation bundle is built with any of these features set to
Copy this zip archive back to |
sync_principals |
N/A |
Gets a list of principals from a KDC running on a cluster node and use it to create externally-identified database users in Query Server. You can do the same by including the similarly-named sync_principals parameter in a Jaguar configuration file during Jaguar |
--object-store-http-proxy |
N/A | Specify a different proxy for Object Store access than the one set in the configuration file. |
--object-store-no-proxy |
N/A | Sets a no-proxy value and overrides the no_proxy value that may be set in the configuration file.
|
uninstall |
N/A |
Uninstall Oracle Big Data SQL from the Hadoop cluster. The uninstall process stops the |
Note:
When Oracle Big Data SQL is uninstalled on the Hadoop side, any queries against Hadoop data that are in process on the database side will fail. It is strongly recommended that you uninstall Oracle Big Data SQL from all databases systems shortly after uninstalling the Hadoop component of the software.See Also:
Uninstalling Oracle Big Data SQL.2.4 Steps for Installing on the Hadoop Cluster
After you have set up the Jaguar configuration file according to your requirements, follow these steps to run the Jaguar installer, which will install Oracle Big Data SQL on the Hadoop cluster and will also generate a database-side installation bundle that you deploy to the Oracle Database system. In these steps, bds-config.json
is the configuration filename passed to Jaguar. This is the default. Any file name is accepted, therefore you can create separate configuration files for installation on different clusters and save them in different files.
Note:
Jaguar requires Python 2.7 to 3.0. Versions greater than 3.0 are not supported by Oracle Big Data SQL at this time. If necessary, you can add a Jaguar-compatible version of Python as a secondary installation. Revisit the prerequisites section in the Introduction for details. If you are using Oracle Big Data Appliance, do not overwrite the Mammoth-installed Python release.-
Log on to the cluster management server node as
root
and cd to the directory where you extracted the downloaded Oracle Big Data SQL installation bundle. -
Cd to the
BDSJaguar
subdirectory under the path where you unzipped the bundle.# cd <Big Data SQL Install Directory>/BDSJaguar
-
Edit the file
bds-config.json
.{ "cluster": { "name": "<Your cluster name>" } }
Add the parameters that you want to use in this installation.
See Also:
The cluster name is the only required parameter, but it is required only in environments where the configuration management service must manage more than one cluster. See the Jaguar Configuration Parameter and Command Reference for a description of all available parameters. You can see an example of a
bds-config.json
file populated with all available parameters in bds-config.json Configuration Examples.In the BDSJaguar directory, run the Jaguar
install
operation. Pass theinstall
parameter and the configuration file name. (bds-config.json
is the implicit default) as arguments to the Jaguar command. You may or may not need to include the--requestdb
option.[root@myclusteradminserver:BDSjaguar] # ./jaguar install <config file name>
Note:
By default, Database Authentication is set to true unless you setdatabase_auth_enabled
to “false” in the configuration file. If you enable Database Authentication, then either as part of the install operation or later, generate a “request key.” This is half of a GUID/key pair used in the authentication process. To generate this key, include the--requestdb
parameter in the Jaguarinstall
command line:
If the install was run with[root@myclusteradminserver:BDSjaguar] # ./jaguar --requestdb mydb install
database_auth_enabled
is “true”, you can use the Jaguardatabasereq
command to generate the key after the database-side installation. Several other Jaguar commands can also generate the request key if you pass them the--requestdb
parameter.Jaguar prompts for the cluster management service administrator credentials and then installs Oracle Big Data SQL throughout the Hadoop cluster. It also generates the database-side installation bundle in the
db-bundles
subdirectory. The following message is returned if the installation completed without error.BigDataSQL: INSTALL workflow completed.
-
Check for the existence of the database side installation bundle:
# ls
<Big Data SQL Install Directory>/BDSJaguar/db-bundles
bds-4.1.2-db-<cluster>-<yymmdd.hhmi>.zipThis bundle is for setting up Oracle Big Data SQL connectivity Oracle database and the specific cluster defined in the
bds-config.json
(or other) configuration file. It contains all packages and settings files required except for an optional database request key file.If you included--requestdb
in the install command, then the installation also generates one or more database request key files under thedbkeys
subdirectory. You should check to see that this key exists.# ls
<Big Data SQL Install Directory>/BDSJaguar/dbkeys
cluster1db.reqkey
This completes the Oracle Big Data SQL installation on the Hadoop cluster.
See Also:
- Working With Query Server in the Oracle Big Data SQL User's Guide. If you chose to install Query Server, you can connect and start working with it now. It is not dependent on completion of the Oracle Database side of the installation.
- Post-Installation Tasks in this guide. Most of the tasks described are performed on the Hadoop system. You may want to complete those tasks before proceeding to the second half of the installation on the Oracle Database system. All of them are optional.
What Next?
After Jaguar has successfully installed Oracle Big Data SQL on the Hadoop cluster, you are done with the first half of the installation. The next step is to install Oracle Big Data SQL on the Oracle Database system that will run queries against the data on the Hadoop cluster.
To do this, copy the database-side installation bundle to any location on the Oracle Database system. Unless you set database_auth_enabled
to “false”
in the configuration file, then also copy over the .reqkey
file generated by Jaguar.
Tip:
You only need to send a request key to a database once. A single request key is valid for all Hadoop cluster connections to the same database. If you have already completed the installation to connect one Hadoop cluster to a specific database, then the database has the key permanently and you do not need to generate it again or copy it over to the database again in subsequent cluster installations.Go to Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL for instructions on unpacking the bundle and installing the database-side components of the software.
See Also:
An example of the complete standard output from a successful installation is provided in Oracle Big Data SQL Installation Example.