6.4 Synchronize Query Server With Hive
You can synchronize the Query Server with the Hive databases that you specified by using Cloudera Manager, Apache Ambari, or the dbms_bdsqs.sync_hive_databases
PL/SQL API.
- Execute the Restart this Big Data SQL Query Server command in Cloudera Manager or Apache Ambari.
- Execute the Synchronize Hive Databases command in Cloudera Manager or Apache Ambari.
- Invoke the
dbms_bdsqs.sync_hive_databases
PL/SQL API locally on the edge node.
bds-config.json
configuration file or by using the sync_hive_db_list configuration parameter in Cloudera Manager.
Note that the dbms_bdsqs.sync_hive_databases
PL/SQL API will only refresh the Hive table definitions for the Hive databases that have already been synchronized through the other two methods.
6.4.1 Restart Query Server Manually Using Cloudera Manager
You can synchronize Query Server with the Hive databases that you specified by restarting Query Server in Cloudera Manager or Apache Ambari.
bds-config.json
configuration file or by using the sync_hive_db_list configuration parameter in Cloudera Manager.
6.4.2 Synchronize Query Server Manually Using Cloudera Manager
You can use Cloudera Manager or Apache Ambari to manually synchronize Query Server with the Hive databases that you specified.
bds-config.json
configuration file or by using the sync_hive_db_list configuration parameter in Cloudera Manager.
6.4.3 Synchronize Query Server Using the PL/SQL API
You can synchronize Query Server with the Hive databases that you specified by using the PL/SQL API.
To do so, invoke the dbms_bdsqs.sync_hive_databases
PL/SQL API locally on the edge node where the Query Server is installed.
The procedure contains no parameters. It synchronizes all of the Hive databases that are already in Query Server. The API will refresh the Query Server with only the Hive databases listed in the sync_hive_db_list
configuration parameter. Each successive synchronization (also known as a refresh) will process changes since the last Query Server metadata refresh.
A synchronization captures any tables that were added or dropped in the Hive metastore since the last refresh. This also includes any tables whose schemas might have changed.
6.4.4 Enable Query Server Full Synchronization
You can specify whether Query Server performs a delta (default) or a full synchronization.
During the Query Server installation process, the Oracle schemas and the appropriate external tables are created based on the Hive databases list that you can specify either in the bds-config.json
configuration file or the sync_hive_db_list
configuration parameter. In that case, Query Server performs a full synchronization. By default, Query Server performs a delta synchronization during subsequent restarts or synchronizations.
You can control whether Query Server performs a full or a delta synchronization by using the Enable full synchronization configuration parameter in Cloudera Manager or Apache Ambari. This configuration parameter is de-selected by default. To enable Query Server to perform a full synchronization, select this checkbox in Cloudera Manager or Apache Ambari. For example, you can use Cloudera Manager to enable Query Server to perform a full synchronization during a restart or a manual synchronization as follows:
- Login to Cloudera Manager by using your login credentials.
- In Cloudera Manager, use the Search field to search for the Enable full synchronization configuration parameter. Enter / Enable full synchronization (or enter part of the name until it is displayed in the list) in the Search field, and then press Enter.
- Click Big Data SQL: Enable full synchronization. The checkbox is de-selected by default. This indicates that Query Sever will perform a delta synchronization.
- To enable full synchronization, select the checkbox, and then click Save Changes.
A full synchronization drops all of the existing schemas and external tables from Query Server, and then re-creates new schemas and new external tables based on the Hive databases list that you specified in the sync_hive_db_list
configuration parameter.
By default, Query Server performs a delta synchronization between the Hive databases in the metastore that you specify and Query Server. Any changes in the Hive databases such as dropping or adding tables will be reflected in Query Server. However, When you start Query Server for the very first time, it will create Oracle Schemas based on the Hive databases that you specify either in the bds-config.json
configuration file or in sync_hive_db_list
configuration parameter in Cloudera Manager or Apache Ambari.
The first time the Query Server synchronizes with Hive the process will be slower than usual. This is because it is importing all of the tables for the specified databases (configured in Cloudera Manager or Apache Ambari) in the Hadoop cluster. Subsequent refreshes should be much faster as it would only refresh the changes that were made to the Hive Metadata such as additions of new tables. During a delta import, the Query Server will also gather new statistics for tables that have been added/modified.