1 What Is Oracle Big Data SQL?

Oracle Big Data SQL supports queries against non-relational data stored in multiple big data sources, including Apache Hive, HDFS, Object Stores, Oracle NoSQL Database, Apache Kafka, Apache HBase, and other NoSQL databases.

Unified queries for distributed data is supported by Oracle Big Data SQL. Therefore providing the ability for users to view and analyze data from disparate data stores seamlessly, as if it were all stored in an Oracle database.

Oracle Big Data SQL enables you to execute highly complex SQL SELECT statements against data in the Hadoop ecosystem, either manually or through your existing applications. For example, if you are a user of Oracle Advanced Analytics, Oracle Big Data SQL enables you to extend your Oracle Database data mining models to big data in Hadoop.

The following architectural diagram illustrates the data flow of SQL queries, including components that are involved in the queries such as drivers, external tables, and Hadoop clusters and nodes.

Description of bdsql_architecture.png follows
Description of the illustration bdsql_architecture.png

In the above diagram, the data flow for the following queries is shown:

  • Query Kafka using Oracle SQL Access to Kafka (OSaK) - Oracle SQL Access to Kafka enables Oracle SQL to access Kafka topics, without using Hive or Hadoop. The ORA_KAFKA PL/SQL package contains functions and procedures to enable this. Before you can query Kafka, register a Kafka cluster in your database schema using ORA_KAFKA.REGISTER_CLUSTER, and then create views using the ORA_KAFKA.CREATE_VIEWS procedure. This procedure creates external tables under the covers, you do not have to explicitly create external tables. See Oracle SQL Access to Kafka.
  • Query Kafka using the Hive storage handler - The ORACLE_HIVE access driver can access Kafka topics using the Hive storage handler that is part of the Oracle Big Data SQL installation. You first create a Hive external table that accesses the Kafka topics, and then create an Oracle Big Data SQL table over it. See Use Oracle's Hive Storage Handler for Kafka to Create a Hive External Table for Kafka Topics.
  • Query Apache HBase - You can use the ORACLE_HIVE access driver to access data stored in Apache HBase. You first create a Hive external table that accesses the HBase table, and then create an Oracle Big Data SQL table over it. See Create an Oracle External Table for Apache HBase.
  • Query NoSQL - The ORACLE_HIVE access driver is used to access data stored in Oracle NoSQL Database. You first create a Hive external table that accesses the KVStore, then you create an exernal table in Oracle Big Data SQL over it. See Create an External Table for Oracle NoSQL Database.
  • Query Object Stores - The ORACLE_BIGDATA access driver enables you to create an external table over data stored in object stores. Oracle Big Data SQL supports access to objects stores such as Oracle Object Store, Amazon S3, and Azure Blob Storage. See Create an Oracle External Table for Object Store Access.
  • Query HDFS - The ORACLE_HDFS access driver enables you to access many types of data that are stored in HDFS, but which do not have Hive metadata. You can define the record format of text data, or you can specify a SerDe for a particular data format. See Create an Oracle External Table for HDFS Files.