17.5 ORACLE_BIGDATA Accessing Apache Iceberg

See how to use the Apache Iceberg open table format with the Oracle Big Data Access Driver

17.5.1 Apache Iceberg Tables Overview

The ORACLE_BIGDATA Access Driver allows users to access data stored in object stores as if the data resided in Oracle Database tables.

The ORACLE_BIGDATA functionality now extends to include support for Apache Iceberg, a widely adopted open table format that introduces features such as schema evolution, time-travel queries, and fast query planning. Iceberg integration enables efficient data management for external data sets in data lakes.

Key features of Apache Iceberg:

  • Updates/Deletes: Serializable isolation of updates and deletes enhances consistency.
  • Time-Travel Queries: You can query historical snapshots.
  • Schema Evolution: You can manage changes in table schemas without data migration.
  • Partition Evolution: You can perform logical partitioning without having to perform physical data movement.
  • Extensive Metadata: With enhanced metadata, you can set up advanced optimizations, such as partition pruning and column statistics.

17.5.2 Supported Configurations for Apache Iceberg

Oracle supports catalog-based and non-catalog based data, and supports Parquet data formats with the Apache Iceberg table format

Catalog-Based

  • AWS Glue Catalog: Metadata and data are stored in Amazon S3, managed by AWS Glue.

Non-Catalog (File-Based)

  1. Manifest File: Directly specify the path to the metadata/manifest file for Iceberg tables.

File Formats

Oracle Database supports IParquet data format for Iceberg tables.

17.5.3 Iceberg-Specific Access Parameters

Oracle supports ORACLE_BIGDATA access parameters that defines Apache Iceberg table access.

Table 17-6 Iceberg-Specific Access Parameters

Parameter Description Mandatory

com.oracle.bigdata.access_protocol

Table protocol defnition. This must be set to iceberg.

Yes

com.oracle.bigdata.access_protocol.config

JSON configuration for catalog details (For example: AWS Glue or OCI Hadoop Catalog).

Optional

com.oracle.bigdata.fileformat

File format used in Iceberg tables (For example: Parquet

Yes

17.5.4 Examples of Table Creation and Inline External Table SQL for Iceberg Tables

The following examples demonstrate how to create tables and perform queries on Iceberg tables in Oracle Database, leveraging both manifest files and AWS Glue catalogs.

17.5.4.1 Creating a Table Pointing to the Manifest File

This example creates an external table that references a specific manifest file for Iceberg:

Example 17-14 Creating a Table Pointing to the Manifest File

CREATE TABLE CUSTOMERS_ICEBERG
(
    CUSTOMER_ID NUMBER,
    FIRST_NAME VARCHAR2(64),
    LAST_NAME VARCHAR2(64),
    EMAIL VARCHAR2(64),
    SIGNUP_DATE DATE
)
ORGANIZATION EXTERNAL
(
    TYPE ORACLE_BIGDATA
    DEFAULT DIRECTORY DATA_PUMP_DIR
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=parquet
        com.oracle.bigdata.credential.name=AWS_S3_CREDENTIAL
        com.oracle.bigdata.access_protocol=iceberg
    )
    LOCATION ('iceberg:https://sales-data.s3.us-west-2.amazonaws.com/customers/metadata/00001-27da..ef5.metadata.json')
)
PARALLEL;

17.5.4.2 Inline External Table Query (Manifest File Reference)

In this example, no database object is created. Instead, the query directly references the manifest file.

Example 17-15 Inline External Table Query (Manifest File Reference)

SELECT *
FROM EXTERNAL
(
    (
        CUSTOMER_ID NUMBER,
        FIRST_NAME VARCHAR2(64),
        LAST_NAME VARCHAR2(64),
        EMAIL VARCHAR2(64),
        SIGNUP_DATE DATE
    )
    TYPE ORACLE_BIGDATA
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=parquet
        com.oracle.bigdata.credential.name=AWS_S3_CREDENTIAL
        com.oracle.bigdata.access_protocol=iceberg
    )
    LOCATION ('iceberg:https://sales-data.s3.us-west-2.amazonaws.com/customers/metadata/00001-27da..ef5.metadata.json')
) t;

17.5.4.3 Creating a Table Using DBMS_CLOUD

This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to define an external table.

Example 17-16 Creating a Table Using DBMS_CLOUD

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name => 'CUSTOMERS_ICEBERG',
        credential_name => 'AWS_S3_CREDENTIAL',
        file_uri_list => 'https://sales-data.s3.us-west-2.amazonaws.com/customers/metadata/00001-27da..ef5.metadata.json',
        format =>'{"access_protocol":{"protocol_type":"iceberg"}}'
    );
END;

17.5.4.4 Creating a Table Using AWS Glue as a Catalog

This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to define an external table.

Example 17-17 Creating a Table Using AWS Glue as a Catalog

This example defines a table that uses the Amazon Web Service data integration service AWS Glue as the Iceberg catalog:

CREATE TABLE CUSTOMERS_ICEBERG
(
    CUSTOMER_ID NUMBER,
    FIRST_NAME VARCHAR2(64),
    LAST_NAME VARCHAR2(64),
    EMAIL VARCHAR2(64),
    SIGNUP_DATE DATE
)
ORGANIZATION EXTERNAL
(
    TYPE ORACLE_BIGDATA
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=parquet
        com.oracle.bigdata.credential.name=AWS_S3_CREDENTIAL
        com.oracle.bigdata.access_protocol=iceberg
        com.oracle.bigdata.access_protocol.config='{"iceberg_catalog_type": "aws_glue", "iceberg_glue_region": "us-west-2", "iceberg_table_path": "sales_db.customers"}'
    )
    LOCATION ('iceberg:')
)
PARALLEL;

17.5.4.5 Inline External Table Query Using AWS Glue Catalog

This query uses an inline external table referencing the Amazon Web Service data integration service AWS Glue as the catalog.

Example 17-18 Inline External Table Query Using AWS Glue Catalog

SELECT *
FROM EXTERNAL
(
    (
        CUSTOMER_ID NUMBER,
        FIRST_NAME VARCHAR2(64),
        LAST_NAME VARCHAR2(64),
        EMAIL VARCHAR2(64),
        SIGNUP_DATE DATE
    )
    TYPE ORACLE_BIGDATA
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=parquet
        com.oracle.bigdata.credential.name=AWS_S3_CREDENTIAL
        com.oracle.bigdata.access_protocol=iceberg
        com.oracle.bigdata.access_protocol.config='{"iceberg_catalog_type": "aws_glue", "iceberg_glue_region": "us-west-2", "iceberg_table_path": "sales_db.customers"}'
    )
    LOCATION ('iceberg:')
) t;

17.5.4.6 Creating an External Table Using DBMS_CLOUD with AWS Glue Catalog

This example demonstrates using the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure with the Amazon Web Service data integration service AWS Glue as the catalog.

Example 17-19 Creating an External Table Using DBMS_CLOUD with AWS Glue Catalog

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name => 'CUSTOMERS_ICEBERG',
        credential_name => 'AWS_S3_CREDENTIAL',
        file_uri_list => '',
        format =>'{"access_protocol":{"protocol_type":"iceberg","protocol_config":{"iceberg_catalog_type": "aws_glue", "iceberg_glue_region": "us-west-2", "iceberg_table_path": "sales_db.customers"}}}'
    );
END;