1.8 About Oracle Big Data SQL Statistics

Oracle Big Data SQL provides a number of statistics that can contribute data for performance analyses.

Five Key Cell XT and Storage Index Statistics

If a query is off-loadable, the following XT-related statistics that can help you to determine what kind of I/O savings you can expect from the offload and from Smart Scan.

  • cell XT granules requested for predicate offload

    Note that number of granules requested depends on a number of a factors, including the HDFS block size, Hadoop data source splittability, and the effectiveness of Hive partition elimination.

  • cell XT granule bytes requested for predicate offload

    The number of bytes requested for the scan. This is the size of the data on Hadoop to be investigated after Hive partition elimination and before Storage Index evaluation.

  • cell interconnect bytes returned by XT smart scan

    The number of bytes of I/O returned by an XT smart scan to Oracle Database.

  • cell XT granule predicate offload retries

    The number of times that a Big Data SQL process running on a DataNode could not complete the requested action. Oracle Big Data SQL automatically retries failed requests on other DataNodes that have a replica of the data. The retries value should be zero.

  • cell XT granule IO bytes saved by storage index

    The number of bytes filtered out by storage indexes at the storage cell level. This is data that was not scanned, based information provided by the storage indexes.

You can check these statistics before and after running queries as follows. This example shows the values at null, before running any queries.

SQL> SELECT sn.name,ms.value 
FROM V$MYSTAT ms, V$STATNAME sn 
WHERE ms.STATISTIC#=sn.STATISTIC# AND sn.name LIKE '%XT%'; 

NAME                                                      VALUE
-----------------------------------------------------     -----
cell XT granules requested for predicate offload          0 
cell XT granule bytes requested for predicate offload     0
cell interconnect bytes returned by XT smart scan         0 
cell XT granule predicate offload retries                 0
cell XT granule IO bytes saved by storage index           0 

You can check some or all of these statistics after execution of a query to test the effectiveness of the query, as in:

SQL> SELECT n.name, round(s.value/1024/1024) 
FROM v$mystat s, v$statname n
WHERE s.statistic# IN (462,463)
AND s.statistic# = n.statistic#;

cell XT granule bytes requested for predicate offload  32768
cell interconnect bytes returned by XT smart scan   32

Five Aggregation Offload Statistics

The following statistics can help you analyze the performance of aggregation offload.

  • vector group by operations sent to cell

    The number of times aggregations can be offloaded to the cell.

  • vector group by operations not sent to cell due to cardinality

    The number of scans that were not offloaded because of large wireframe.

  • vector group by rows processed on cell

    The number of rows that were aggregated on the cell.

  • vector group by rows returned by cell

    The number of aggregated rows that were returned by the cell.

  • vector group by rowsets processed on cell

    The number of rowsets that were aggregated on the cell.

You can review these statistics by running the queries as follows:

SQL> SELECT count(*) FROM bdsql_parq.web_sales;

  COUNT(*)
----------
 287301291

SQL> SELECT substr(n.name, 0,60) name, u.value
FROM v$statname n, v$mystat u
WHERE ((n.name LIKE 'key vector%') OR
       (n.name LIKE 'vector group by%') OR
       (n.name LIKE 'vector encoded%') OR
       (n.name LIKE '%XT%') OR
       (n.name LIKE 'IM %' AND n.name NOT LIKE '%spare%'))
      AND u.sid=userenv('SID')
      AND n.STATISTIC# = u.STATISTIC#
      AND u.value > 0;


NAME                                                      VALUE
-----------------------------------------------------     -----
cell XT granules requested for predicate offload          808 
cell XT granule bytes requested for predicate offload     2.5833E+10
cell interconnect bytes returned by XT smart scan         6903552 
vector group by operations sent to cell                   1
vector group by rows processed on cell                    287301291
vector group by rows returned by cell                     808

Nine Key Vector Statistics

The following statistics can help you analyze the effectiveness of key vectors that were sent to the cell.

  • key vectors sent to cell

    The number of key vectors that were offloaded to the cell.

  • key vector filtered on cell

    The number of rows that were filtered out by a key vector on the cell.

  • key vector probed on cell

    The number of rows that were tested by a key vector on the cell.

  • key vector rows processed by value

    The number of join keys that were processed by using their value.

  • key vector rows processed by code

    The number of join keys that were processed by using the dictionary code.

  • key vector rows filtered

    The number of join keys that were skipped due to skip bits.

  • key vector serializations in lite mode for cell

    The number of times a key vector was not encoded due to format or size.

  • key vectors sent to cell in lite mode due to quota

    The number of key vectors that were offloaded to the cell for non-exact filtering due to the 1 MB metadata quota.

  • key vector efilters created

    A key vector was not sent to a cell, but an efilter (similar to a bloom filter) was sent.

You can review these statistics by running the queries as follows:

SELECT substr(n.name, 0,60) name, u.value
FROM v$statname n, v$mystat u
WHERE ((n.name LIKE 'key vector%') OR
       (n.name LIKE 'vector group by%') OR
       (n.name LIKE 'vector encoded%') OR
       (n.name LIKE '%XT%'))
      AND u.sid=userenv('SID')
      AND n.STATISTIC# = u.STATISTIC#


NAME                                                      VALUE
-----------------------------------------------------     -----
cell XT granules requested for predicate offload          250 
cell XT granule bytes requested for predicate offload     61,112,831,993
cell interconnect bytes returned by XT smart scan         193,282,128 
key vector rows processed by value                        14,156,958
key vector rows filtered                                  9,620,606
key vector filtered on cell                               273,144,333
key vector probed on cell                                 287,301,291
key vectors sent to cell                                  1
key vectors sent to cell in lite mode due to quota        1
key vector serializations in lite mode for cell           1
key vector efilters created                               1

Tip:

The Oracle Big Data SQL Quickstart blog, published in the Data Warehouse Insider, provides a series of code and functionality walkthroughs that show you how to use these statistics to analyze the performance of Oracle Big Data SQL. See Part 2, Part 7, and Part 10.