6.1 About Oracle Big Data SQL Query Server
Oracle Big Data SQL Query Server is an Oracle Database instance that you can optionally install as a component of Oracle Big Data SQL on an edge node in your Hadoop cluster. You use Query Server to primarily query data stored in the cluster (in HDFS and Hive formats) or object stores using Oracle external tables. This enables you to take advantage of the full SQL capabilities provided by the Oracle Database.
You can define external tables using the ORACLE_HDFS
and ORACLE_HIVE
or ORACLE_BIGDATA
access drivers or have the Query Server automatically define external tables based on the metadata in the Hive metastore. In the latter case, Hive databases map to Oracle Database schemas – and the corresponding Hive tables are defined as Oracle external tables in those schemas. All data authorization is based on authorization rules in Hadoop such as Apache Sentry or HDFS Access Controls Lists (ACLs).
Once installed, Query Server provides an Oracle Database deployment that is automatically configured to query data in your Hadoop cluster using SQL. Restarting the Query Server restores the database to a “clean” state, eliminating management overhead. A restart preserves external tables (ORACLE_HIVE, ORACLE_HDFS, and ORACLE_BIGDATA types), associated statistics, user defined views, and credentials. A restart deletes regular tables containing user data.
If your solution requires High Availability (HA), advanced Oracle security policies, or combining data in Oracle Database with data in Hadoop, then you should leverage a full-blown Oracle Database with Big Data SQL. Oracle supports using both Query Server and a Big Data SQL enabled Oracle Database for a single Hadoop deployment.
To install Query Server, you must specify an existing edge node in your Hadoop
cluster in the bds-config.json
configuration file. You use the same
configuration file to specify a list of Hive databases. Query Server automatically
creates Oracle external tables corresponding to the tables in the Hive metastore
database(s) so that they are ready for querying after a successful installation. The set
of external tables in the Query Server can be automatically kept up-to-date with the
corresponding Hive metastore tables by running either the Restart this Big Data SQL
Query Server or the Synchronize Hive Databases commands in Cloudera
Manager or Apache Ambari cluster management software. You can also use the
dbms_bdsqs.sync_hive_databases
PL/SQL API package procedure. See
SYNC_HIVE_DATABASES.
See Also:
See Introduction in the Oracle Big Data SQL Installation Guide, which describes how to install and configure the software on the two sides of an Oracle Big Data SQL configuration.
6.1.1 Query Server Features
Big Data SQL Query Server provides automatic installation and configuration, integration with Hadoop cluster managers, and automatic integration of cluster metadata:
- Automatic installation and configuration: Oracle Big Data SQL installer
automatically installs and configures Query Server, if you specify
an existing target edge node in the Hadoop cluster in the
bds-config.json
configuration file. To specify the edge node where to install Query Server, you add theedgedb
parameter and the node and enabled attributes to thebds-config.json
configuration file to as shown in the following example where<edgenode_host_name>
is the name of your edge node:"edgedb": { "node" : "dbnode.domain.com", "enabled" : "true", "sync_hive_db_list" : "my_hive_db_1,my_hive_db2" }
Note:
If thebds-config.json
configuration file does not include the edgedb subsection, then Query Server is not installed.See Also:
bds-config.json Configuration Example in Oracle Big Data SQL Installation Guide shows a fully-populatedbds-config.json
file. The example includes all available configuration parameters. - Integration with Hadoop cluster managers: You can monitor and manage Query Server as a service using Cloudera Manager or Apache Ambari Hadoop cluster management tools.
- Synchronization with Hive: When you start the Oracle Big Data service, Query Server automatically refreshes its metadata from the Hive metastore. After the initial refresh, users can synchronize the Query Server with the latest metadata in the Hive metastore.