1.3 About Smart Scan for Big Data Sources

Oracle external tables do not have traditional indexes. Queries against these external tables typically require a full table scan. The Oracle Big Data SQL processing agent on the DataNodes of the Hadoop cluster extends Smart Scan capabilities (such as filter-predicate off-loads) to Oracle external tables. Smart Scan has been used for some time on the Oracle Exadata Database Machine to do column and predicate filtering in the Storage Layer before query results are sent back to the Database Layer. In Oracle Big Data SQL, Smart Scan is a final filtering pass done locally on the Hadoop server to ensure that only requested elements are sent to Oracle Database. Oracle storage servers running on the Hadoop DataNodes are capable of doing Smart Scans against various data formats in HDFS, such as CSV text, Avro, and Parquet.

This implementation of Smart Scan leverages the massively parallel processing power of the Hadoop cluster to filter data at its source. It can preemptively discard a huge portion of irrelevant data—up to 99 percent of the total. This has several benefits:

  • Greatly reduces data movement and network traffic between the cluster and the database.

  • Returns much smaller result sets to the Oracle Database server.

  • Aggregates data when possible by leveraging scalability and cluster processing.

Query results are returned significantly faster. This is the direct result reduced traffic on the network and reduced load on Oracle Database.

See Also:

See Oracle Database Concepts for a general introduction to external tables and pointers to more detailed information in the Oracle Database documentation library