1.5 About Predicate Push Down

Many Big Data systems support some level of predicate off-loading, either through the filetype itself (e.g. Apache Parquet), or through Hive’s partitioning and StorageHandler APIs. Oracle Big Data SQL takes advantage of these off-load capabilities by pushing predicates from the Oracle Database into supporting systems. For example, predicate push down enables the following automatic behaviors:

  • Queries against partitioned Hive tables are pruned, based on filter predicates on partition columns.

  • Queries against Apache Parquet and Apache ORC files reduce I/O by testing predicates against the internal index-like structures contained within these file formats.

    Note:

    Predicate pushdown in queries against Parquet files is inefficient unless the files are generated through Hive using the workaround described in the next section.
  • Queries against Oracle NoSQL Database or Apache HBase use predicates to drive subscans of data in the remote data store.

Required Datatypes to Enable Predicate Push Down

Predicate push down requires that certain mappings between Hive Datatypes and Oracle Datatypes be present. These mappings are described in the following table.

Hive Datatype Mapped To Oracle Datatype

CHAR(m)

CHAR(n), VARCHAR2(n) where n is >= m

VARCHAR(m)

CHAR(n), VARCHAR2(n) where n is >= m.

string

CHAR(n), VARCHAR2(n)

DATE

DATE

TIMESTAMP

TIMESTAMP(9) Hive TIMESTAMP has nanoseconds, 9 digit fractional seconds.

TINYINT

NUMBER(3) preferably, but NUMBER or NUMBER(n) for any value of n is valid.

SMALLINT 

NUMBER(5) preferably, but NUMBER or NUMBER(n) for any value of n is valid.

INT  

NUMBER(10) preferably, but NUMBER or NUMBER(n) for any value of n is valid.

BIGINT                    

NUMBER(19) preferably, but NUMBER or NUMBER(n) for any value of n is OK

DECIMAL(m)

NUMBER(n) where m = n preferably, but NUMBER or NUMBER(n) for any value of n is valid.

FLOAT                      

BINARY_FLOAT

DOUBLE                     

BINARY_DOUBLE

BINARY

RAW(n)

BOOLEAN

CHAR(n), VARCHAR2(n) where n is >= 5, values 'TRUE', 'FALSE'

BOOLEAN

NUMBER(1) preferably, but NUMBER or NUMBER(n) for any value of n is valid. Values 0 (false), 1 (true).