1.5 About Predicate Push Down
Many Big Data systems support some level of predicate off-loading, either through the filetype itself (e.g. Apache Parquet), or through Hive’s partitioning and StorageHandler APIs. Oracle Big Data SQL takes advantage of these off-load capabilities by pushing predicates from the Oracle Database into supporting systems. For example, predicate push down enables the following automatic behaviors:
-
Queries against partitioned Hive tables are pruned, based on filter predicates on partition columns.
-
Queries against Apache Parquet and Apache ORC files reduce I/O by testing predicates against the internal index-like structures contained within these file formats.
Note:
Predicate pushdown in queries against Parquet files is inefficient unless the files are generated through Hive using the workaround described in the next section. -
Queries against Oracle NoSQL Database or Apache HBase use predicates to drive subscans of data in the remote data store.
Required Datatypes to Enable Predicate Push Down
Predicate push down requires that certain mappings between Hive Datatypes and Oracle Datatypes be present. These mappings are described in the following table.
Hive Datatype | Mapped To Oracle Datatype |
---|---|
CHAR(m) |
CHAR(n), VARCHAR2(n) where n is >= m |
VARCHAR(m) |
CHAR(n), VARCHAR2(n) where n is >= m. |
string |
CHAR(n), VARCHAR2(n) |
DATE |
DATE |
TIMESTAMP |
TIMESTAMP(9) Hive TIMESTAMP has nanoseconds, 9 digit fractional seconds. |
TINYINT |
NUMBER(3) preferably, but NUMBER or NUMBER(n) for any value of n is valid. |
SMALLINT |
NUMBER(5) preferably, but NUMBER or NUMBER(n) for any value of n is valid. |
INT |
NUMBER(10) preferably, but NUMBER or NUMBER(n) for any value of n is valid. |
BIGINT |
NUMBER(19) preferably, but NUMBER or NUMBER(n) for any value of n is OK |
DECIMAL(m) |
NUMBER(n) where m = n preferably, but NUMBER or NUMBER(n) for any value of n is valid. |
FLOAT |
BINARY_FLOAT |
DOUBLE |
BINARY_DOUBLE |
BINARY |
RAW(n) |
BOOLEAN |
CHAR(n), VARCHAR2(n) where n is >= 5, values 'TRUE', 'FALSE' |
BOOLEAN |
NUMBER(1) preferably, but NUMBER or NUMBER(n) for any value of n is valid. Values 0 (false), 1 (true). |