9.2.24 Iceberg Event Handler

Iceberg is a high-performance table format for extremely large analytic tables. Iceberg brings the reliability and simplicity of SQL tables to GG for DAA, while making it possible for engines, such as Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time.

9.2.24.1 Detailed Functionality

The Oracle GoldenGate Iceberg Replicat can replicate GoldenGate trail records to Iceberg tables.

The Iceberg open-table-format files could be written to local files, AWS Simple Storage Service(S3), Google Cloud Storage(GCS), or Azure DataLake Storage(ADLS).

9.2.24.1.1 Replication without a SQL Engine

Oracle GoldenGate Iceberg Replicat process does not require a SQL engine to replicate data to Iceberg tables.

It uses the Iceberg Java SDK along with object storage specific Java SDK to write data to Iceberg tables.

9.2.24.1.2 Iceberg File Format

The default file format for Iceberg data files and delete files is Parquet.

Oracle GoldenGate can be configured to write files in any of the following Iceberg supported file formats:
  • Parquet (default)
  • Avro
  • ORC

9.2.24.1.3 Iceberg Catalog

Oracle GoldenGate supports the following Iceberg catalogs:

  • Hadoop Catalog
  • Nessie Catalog
  • AWS Glue Catalog
  • Polaris Catalog
  • REST Catalog
  • JDBC Catalog

9.2.24.1.4 Iceberg Specification

Oracle GoldenGate generates data files and delete files as per the Iceberg specification version 2.

See https://iceberg.apache.org/spec/#version-2-row-level-deletes

9.2.24.1.5 Delete Files and Merge-On-Read (MoR)

Oracle GoldenGate generates Iceberg delete files for the UPDATE and DELETE operations.

Therefore, the Iceberg table property write.update.mode is always set to merge-on-read.

SQL engines should support merge-on-read to query tables replicated by Oracle GoldenGate.

Iceberg supports two types of delete files:

  • Equality Deletes: The deleted records are identified by the equality of the values in the columns specified in the delete file.
  • Position Deletes: The deleted records are identified by the position of the records in the Iceberg data file.

    In the current release, Oracle GoldenGate uses Iceberg Equality Deletes to delete records from the Iceberg table.

    This allows records to be deleted without looking up the position of the rows in the Iceberg data file.

    Note:

    Contact Oracle support for use cases that require Iceberg Position Deletes.

9.2.24.1.6 Operation Support

The Iceberg event handler supports the following operations:

  • INSERT: Generates Iceberg data files for the insert operations.
  • UPDATE: Generates Iceberg data files and delete files for update operations.
  • DELETE: Generates Iceberg delete files for delete operations.
  • TRUNCATE: Generates an Iceberg delete file with a condition as always true to truncate the target table.

    This operation creates an empty Iceberg snapshot with no data files.

9.2.24.1.7 Compressed Update Handling

A compressed update record in the Oracle GoldenGate trail file contains values for the key columns and the modified columns.

An uncompressed update record contains values for all the columns.

Oracle GoldenGate trails may contain compressed or uncompressed update records. The default extract configuration writes compressed updates to the trail files.

If there are missing column values in the update operations, then Replicat will ABEND.

This behavior can be overridden by setting the parameter gg.eventhandler.iceberg.abendOnMissingColumns=false in the Replicat properties file.

When the parameter is set to false, Replicat will handle compressed updates by querying the previous values of the missing columns from the Iceberg table.

9.2.24.1.7.1 Lookup Missing values in Sparse Updates

The lookup of the missing values is an expensive operation and may impact the performance of the Replicat process.

By default, Oracle GoldenGate writes records to Iceberg in micro batches every ten minutes.

Every micro-batch for a table can potentially contain millions of rows.

Micro batches will be processed for every target table in concurrent threads.

Therefore, it is critical that sufficient JVM heap memory is allocated to the Replicat process.

The lookup is performed only for such rows that contain at least one missing value in the update operation.

Oracle GoldenGate will automatically create target tables. During auto-creation of tables, Oracle GoldenGate Replicat will enable creation of Iceberg metrics (min/max values) for all the identifier (key) columns.

The metrics are stored in the Iceberg metadata files.

Iceberg metrics helps speed up the lookup of the missing values in the UPDATE operations.

9.2.24.1.8 INSERTALLRECORDS Support

Iceberg event handler supports INSERTALLRECORDS parameter. See: https://docs.oracle.com/en/middleware/goldengate/core/21.3/reference/insertallrecords.html#GUID-A1019C40-97BE-437B-9D80-7C99A9A6DB8E. Set the INSERTALLRECORDS parameter in the Replicat parameter file (.prm).

Setting this property directs the Replicat process to generate Iceberg data files to append operation data into the Iceberg target table.

9.2.24.1.9 Operation Aggregation

Operation aggregation is the process of aggregating (merging/compressing) multiple operations on the same row into a single output operation based on a threshold.

Operation records are aggregated in-memory.

You can tune the frequency of apply interval using gg.handler.iceberg.fileRollInterval property, the default value is set to 15m (fifteen minutes).

The Replicat process will generate Iceberg data files and delete files for the aggregated operations.

9.2.24.1.10 Automatic Table Creation

Oracle GoldenGate Replicat will automatically create target tables if the target table does not exist.

9.2.24.1.11 Iceberg Metadata Provider

A new metadata provider for Iceberg is implemented to retrieve the Iceberg target table metadata.

Iceberg Metadata provider is auto configured and enabled by the Replicat process.

9.2.24.1.12 Iceberg Identifier Fields

The identifier fields in the Iceberg table are used to uniquely identify the rows in the Iceberg table.

During the automatic table creation, Oracle GoldenGate maps the key columns to Iceberg identifier fields.

Note:

Iceberg tables without identifier fields are not supported in the current release.

9.2.24.1.13 Primary Key Updates and Truncates

  • Primary key updates with missing column values will trigger files to be flushed to the Iceberg table before the flush interval.

    This can result in small data files and delete files for the primary key update operation.

    For workloads or tables with frequent primary key updates, Oracle recommends to generate trail files with uncompressed update records.

    Oracle also recommends to set gg.validate.keyupdate=true for trail generated from Oracle source.

    There is a known issue with Oracle extract to generate primary key update operations even though the key columns are not modified.

  • A truncate operation will trigger files to be flushed to the Iceberg table before the flush interval.

9.2.24.2 Configuration

The configuration of the Iceberg replication properties is stored in the Replicat properties file.

9.2.24.2.1 Automatic Configuration

Iceberg replication involves configuring multiple components, such as the File Writer Handler, and the target Iceberg Event Handler.

The Automatic Configuration functionality helps you to autoconfigure these components so that the manual configuration is minimal.

The properties modified by autoconfiguration is also logged in the handler log file.

To enable autoconfiguration to replicate to the Iceberg target, set the parameter gg.target=iceberg.

9.2.24.2.1.1 File Writer Configuration

The File Writer Handler name is pre-set to the value iceberg and its properties are automatically set to the required values for Iceberg.

9.2.24.2.1.2 Iceberg Event Handler Configuration

The Iceberg Event Handler name is pre-set to the value iceberg.

This topic details the configuration properties available for the Iceberg Event handler, the required ones must be changed to match your Iceberg configuration.

9.2.24.2.1.2.1 Common Iceberg Properties

Iceberg can be configured to work with multiple catalogs and object stores.

The following are the common properties:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.warehouseLocation Optional String value. None Directory path to the Iceberg warehouse location excluding the object storage scheme. Example: /path/to/warehouse. This is a required property when using the hadoop catalog. For other Iceberg catalogs, warehouse location has a catalog specific requirement.
gg.eventhandler.iceberg.fileRollInterval Optional The default unit of measure is milliseconds. You can stipulate ms, s, m, h to signify milliseconds, seconds, minutes, or hours respectively. Examples of legal values include 10000, 10000ms, 10s, 10m, or 1.5h. Values of 0 or less indicate that file rolling on time is turned off. 15m The parameter determines how often the data will be pushed into the Iceberg warehouse. Use with caution, the higher this value is the more data will need to be stored in the memory of the Replicat process.

Note:

Use the parameter with caution. Increasing its default value (15m) will increase the amount of data stored in the internal memory of the Replicat. This can cause out of memory errors and stop the Replicat if it runs out of memory.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// Warehouse scheme to indicate the Iceberg object storage location. Valid values are: file://, gs://, s3://, s3a://, abfss://. For more information, see File System Scheme.
gg.eventhandler.iceberg.catalogType Optional String value. hadoop Iceberg catalog type. Valid values are: hadoop, jdbc, nessie, rest, glue, polaris.
gg.eventhandler.iceberg.fileFormat Optional parquet, orc, or avro. parquet Iceberg table file format to be used in target tables. Supported file formats: Parquet, Avro, and ORC.
gg.eventhandler.iceberg.icebergTableProperties Optional String value. None Path to a table properties file to specify additional Iceberg table properties to set to the target tables.
gg.eventhandler.iceberg.abendOnMissingColumns Optional true or false. true When set to true and the UPDATE operation contains a missing value, Replicat will ABEND. When set to false, Replicat will not ABEND if UPDATE operations have missing column values. The missing columns values will be read by querying the target tables. This lookup may impact the performance of the Replicat process.
gg.eventhandler.iceberg.abendOnSchemaChanges Optional true or false true When set to true and schema changes are detected, the replicat process will ABEND. User can manually update the target schema and set the configuration to false to proceed. When set to false, a warning message is logged for schema changes.
gg.validate.keyupdate Optional true or false false If set to true, Replicat will validate key update operations (optype 115) and correct to normal update if no key values have changed.
9.2.24.2.1.2.1.1 File System Scheme

The gg.eventhandler.iceberg.fileSystemScheme property is used to specify the object storage scheme.

The following are the supported object storage schemes:

  • file://: Local file system
  • gs://: Google Cloud Storage
  • s3://: AWS S3
  • s3a://: AWS S3
  • abfss://: Azure Data Lake Storage
9.2.24.2.1.2.2 Iceberg Common Dependencies

The following are the common Iceberg dependencies:

<dependencies>
       <!-- Common Iceberg dependencies START -->
       <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.4.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.4.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-arrow</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-core</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-data</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-parquet</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-gcp</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-aws</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-orc</artifactId>
            <version>1.6.1</version>
        </dependency>
    <dependency>
        <groupId>org.apache.iceberg</groupId>
        <artifactId>iceberg-nessie</artifactId>
        <version>1.6.1</version>
    </dependency>
    <!-- Common Iceberg dependencies END -->
</dependencies>

You can download the dependencies from maven central using the script download_dependencies.sh in the DependencyDownloader directory.

Follow these steps:

  1. Change directory to DependencyDownloader.
  2. Edit config_proxy.sh if proxy configuration is required.
  3. Run the script:
    ./download_dependencies.sh xmls/iceberg-common.xml
    This script will download the dependencies and store them in the iceberg-common directory. gg.classpath can be configured to include the dependencies from the iceberg-common directory as follows: gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-common/*
9.2.24.2.1.2.3 AWS Java SDK dependencies for Writing to AWS S3 (s3:// Scheme)

The following are the Iceberg dependencies to write to AWS S3 using the s3:// scheme:

<dependencies>
    <!-- s3:// scheme dependencies START -->
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>s3</artifactId>
        <version>2.28.6</version>
    </dependency>
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>sts</artifactId>
        <version>2.28.6</version>
    </dependency>
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>glue</artifactId>
        <version>2.28.6</version>
    </dependency>
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>url-connection-client</artifactId>
        <version>2.28.6</version>
    </dependency>
    <!-- s3:// scheme dependencies END -->
</dependencies>

The dependencies can be downloaded from maven central using the script download_dependencies.sh in the DependencyDownloader directory.

Follow these steps:

  • Change directory to DependencyDownloader.
  • Edit config_proxy.sh if proxy configuration is required.
  • Run the script: ./download_dependencies.sh xmls/iceberg-aws-java-sdk.xml

This script will download the dependencies and store them in the iceberg-aws-java-sdk directory.

gg.classpath: can be configured to include the dependencies as follows:

gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
9.2.24.2.1.2.4 Hadoop AWS SDK Dependencies for Writing to AWS S3 (s3a:// Scheme)
The following are the Iceberg dependencies to write to AWS S3 using the s3a:// scheme:
<dependencies>
    <!-- s3a:// scheme dependencies START -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>3.4.0</version>
    </dependency>
    <!-- s3a:// scheme dependencies END -->
</dependencies>

You can download the dependencies from maven central using the script download_dependencies.sh in the DependencyDownloader directory.

Follow these steps:

  • Change directory to DependencyDownloader.
  • Edit config_proxy.sh if proxy configuration is required.
  • Run the script:
    ./download_dependencies.sh xmls/iceberg-hadoop-aws.xml

This script will download the dependencies and store them in the iceberg-hadoop-aws directory.

gg.classpath can be configured to include the dependencies as follows:

gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-aws/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
9.2.24.2.1.2.5 Hadoop Google Cloud Storage SDK Dependencies for Writing to Google Cloud Storage (GCS)
The following are the Iceberg dependencies to write to GCS using the Hadoop GCS SDK:
<dependencies>
    <!-- gs:// scheme dependencies START -->
    <dependency>
        <groupId>com.google.cloud.bigdataoss</groupId>
        <artifactId>gcs-connector</artifactId>
        <version>hadoop3-2.2.22</version>
    </dependency>
    <!-- gs:// scheme dependencies END -->
</dependencies>

The dependencies can be downloaded from maven central using the script download_dependencies.sh in the DependencyDownloader directory.

Follow these steps:

  • Change directory to DependencyDownloader.
  • Edit config_proxy.sh if proxy configuration is required.
  • Run the script: ./download_dependencies.sh xmls/iceberg-hadoop-gcs.xml

This script will download the dependencies and store them in the iceberg-hadoop-gcs directory.

gg.classpath can be configured to include the dependencies as follows:
g.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
9.2.24.2.1.2.6 Google Cloud Storage SDK Dependencies for Writing to Google Cloud Storage (GCS)
The following are the Iceberg dependencies to write to GCS using the Google Cloud Storage Java SDK:
<dependencies>
    <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-storage</artifactId>
        <version>2.37.0</version>
    </dependency>
</dependencies>

The dependencies can be downloaded from maven central using the script download_dependencies.sh in the DependencyDownloader directory.

Follow these steps:

  • Change directory to DependencyDownloader.
  • Edit config_proxy.sh if proxy configuration is required.
  • Run the script:
    ./download_dependencies.sh xmls/iceberg-gcs-java-sdk.xml

This script will download the dependencies and store them in the iceberg-gcs-java-sdk directory.

gg.classpath can be configured to include the dependencies as follows:
gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
9.2.24.2.1.2.7 Hadoop Azure SDK Dependencies for Writing to Azure Data Lake (ADLS)
The following are the Iceberg dependencies to write to ADLS using the Hadoop Azure Java SDK:
<dependencies>
    <!-- abfss:// scheme dependencies START -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-azure</artifactId>
        <version>3.4.0</version>
    </dependency>
    <!-- abfss:// scheme dependencies END -->
</dependencies>

The dependencies can be downloaded from maven central using the script download_dependencies.sh in the DependencyDownloader directory.

Follow these steps:

  • Change directory to DependencyDownloader.
  • Edit config_proxy.sh if proxy configuration is required.
  • Run the script:
    ./download_dependencies.sh xmls/iceberg-hadoop-azure.xml

This script will download the dependencies and store them in the iceberg-hadoop-azure directory.

gg.classpath: can be configured to include the dependencies as follows:

gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-azure/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*

9.2.24.2.2 Configuration for Iceberg Nessie Catalog

9.2.24.2.2.1 Configuration for Nessie Catalog and AWS S3 s3:// Scheme

The following are the configuration properties for the Nessie catalog and AWS S3 object store using s3:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop nessie
gg.eventhandler.iceberg.nessieBranch Optional String value. main Nessie Catalog branch name where the Iceberg table metadata exists.
gg.eventhandler.iceberg.nessieUri Required String value. None Nessie Catalog endpoint URI. Example:
http://<nessie-server>.com:10001/api/v2
.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate AWS S3 object storage location: s3://.
gg.eventhandler.iceberg.awsS3Region Required String value. None AWS S3 bucket region. Example: us-east-2.
gg.eventhandler.iceberg.awsS3Bucket Required String value. None AWS S3 bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.awsAccessKeyId Optional String value. None AWS access key id for authentication.
gg.eventhandler.iceberg.awsSecretKey Optional String value. None AWS secret access key for authentication.
gg.eventhandler.iceberg.awsSessionToken Optional String value. None AWS session token for authentication.
gg.eventhandler.iceberg.awsRoleArn Optional String value. None AWS role ARN for authentication.
gg.eventhandler.iceberg.awsS3Endpoint Optional String value. None AWS S3 endpoint.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the AWS S3 object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the AWS S3 object storage.
9.2.24.2.2.1.1 Classpath And Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • AWS SDK dependencies for writing to AWS S3 (s3:// scheme)
9.2.24.2.2.1.2 Sample Configuration for Nessie Catalog and AWS S3 s3:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=nessie
gg.eventhandler.iceberg.nessieBranch=main
gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2
gg.eventhandler.iceberg.fileSystemScheme=s3://
gg.eventhandler.iceberg.awsS3Region=us-east-2
gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket>
gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id>
gg.eventhandler.iceberg.awsSecretKey=<secret-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.2.2 Configuration for Nessie Catalog and AWS S3 s3a:// Scheme

The following are the configuration properties for the Nessie catalog and AWS S3 object store using s3a:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop nessie.
gg.eventhandler.iceberg.nessieBranch Optional String value. main Nessie Catalog branch name where the Iceberg table metadata exists.
gg.eventhandler.iceberg.nessieUri Required String value. None Nessie Catalog endpoint URI. Example: http://<nessie-server>.com:10001/api/v2.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate AWS S3 object storage location: s3a://.
gg.eventhandler.iceberg.awsS3Bucket Required String value. None AWS S3 bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.awsAccessKeyId Required String value. None AWS access key id for authentication.
gg.eventhandler.iceberg.awsSecretKey Required String value. None AWS secret access key for authentication.
gg.eventhandler.iceberg.awsSessionToken Optional String value. None AWS session token for authentication.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the AWS S3 object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the AWS S3 object storage.
9.2.24.2.2.2.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop AWS SDK dependencies for writing to AWS S3 (s3a:// scheme)
9.2.24.2.2.2.2 Sample Configuration for Nessie Catalog and AWS S3 s3a:// scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-aws/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=nessie
gg.eventhandler.iceberg.nessieBranch=main
gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2
gg.eventhandler.iceberg.fileSystemScheme=s3a://
gg.eventhandler.iceberg.awsS3Region=us-east-2
gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket>
gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id>
gg.eventhandler.iceberg.awsSecretKey=<secret-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.2.3 Configuration for Nessie Catalog and GCS gs:// Scheme

The following are the configuration properties for the Nessie catalog and GCS object store using gs:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop nessie.
gg.eventhandler.iceberg.nessieBranch Optional String value. main Nessie Catalog branch name where the Iceberg table metadata exists.
gg.eventhandler.iceberg.nessieUri Required String value. None Nessie Catalog endpoint URI. Example: http://<nessie-server>.com:10001/api/v2.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate GCS object storage location: gs://.
gg.eventhandler.iceberg.gcpStorageBucket Required String value. None Google Cloud Storage bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.gcpProjectId Required String value. None Sets the project-id of the Google Cloud project that houses the GCS bucket.
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile Required String value. None Sets the path to the Google Service account key file.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the GCS object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the GCS object storage.
9.2.24.2.2.3.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
9.2.24.2.2.3.2 Sample Configuration for Nessie Catalog and GCS gs:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=nessie
gg.eventhandler.iceberg.nessieBranch=main
gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2
gg.eventhandler.iceberg.fileSystemScheme=gs://
gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket>
gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id>
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.2.4 Configuration for Nessie Catalog and Azure Data Lake Storage abfss:// Scheme

The following are the configuration properties for the Nessie catalog and Azure Data Lake Storage using abfss:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop nessie.
gg.eventhandler.iceberg.nessieBranch Optional String value. main Nessie Catalog branch name where the Iceberg table metadata exists.
gg.eventhandler.iceberg.nessieUri Required String value. None Nessie Catalog endpoint URI. Example: http://<nessie-server>.com:10001/api/v2.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate Azure Data Lake Storage location: abfss://.
gg.eventhandler.iceberg.azureAccountName Required String value. None Azure storage account name that contains the container for the Iceberg Warehouse.
gg.eventhandler.iceberg.azureContainer Required String value. None Azure storage account container name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.azureAccountKey Required String value. None Azure storage account key.
gg.eventhandler.iceberg.azureBlobEndpoint Optional String value. <azureContainer>@<azureAccountName>.dfs.core.windows.net Azure Storage service endpoint.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the Azure object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the Azure object storage.
9.2.24.2.2.4.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Azure SDK dependencies for writing to Azure Data Lake (ADLS)
9.2.24.2.2.4.2 Sample Configuration for Nessie Catalog and ADLS abfss:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=nessie
gg.eventhandler.iceberg.nessieBranch=main
gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2
gg.eventhandler.iceberg.fileSystemScheme=abfss://
gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name>
gg.eventhandler.iceberg.azureContainer=<azure-storage-container>
gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.2.4.3 Nessie Namespace

Nessie namespace is the top-level container for all the tables in the Nessie catalog.

Before starting the Replicat process, it is required to have existing namespaces before creating or writing to tables.

Nessie namespace can be created using the nessie command line program (nessie-cli-<version>.jar) as follows: create namespace QASOURCE;

The Nessie namespace is mapped to the GoldenGate schema in the MAP statement.

For example: MAP QASOURCE.TCUSTMER, TARGET QASOURCE.TCUSTMER;

9.2.24.2.3 Configuration for Iceberg AWS Glue Catalog

9.2.24.2.3.1 Configuration for Iceberg AWS Glue Catalog and AWS S3 s3:// OR s3a:// Scheme

The following are the configuration properties for the AWS Glue catalog and AWS S3 object store using s3:// or s3a:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop glue.
gg.eventhandler.iceberg.awsGlueId Required String value. None The Glue catalog ID is your numeric AWS account ID.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate AWS S3 object storage location: s3:// or s3a://.
gg.eventhandler.iceberg.awsS3Region Required String value. None AWS S3 bucket region. Example: us-east-2.
gg.eventhandler.iceberg.awsS3Bucket Required String value. None AWS S3 bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.awsAccessKeyId Optional String value. None AWS access key id for authentication.
gg.eventhandler.iceberg.awsSecretKey Optional String value. None AWS secret access key for authentication.
gg.eventhandler.iceberg.awsSessionToken Optional String value. None AWS session token for authentication.
gg.eventhandler.iceberg.awsRoleArn Optional String value. None AWS role ARN for authentication.
gg.eventhandler.iceberg.awsS3Endpoint Optional String value. None AWS S3 endpoint.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the AWS S3 object storage.
gg.eventhandler.iceberg.proxyPort Optional String Value. 80 Proxy server port to connect to the AWS S3 object storage.
9.2.24.2.3.2 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • AWS SDK dependencies for writing to AWS S3 (s3://)
9.2.24.2.3.3 Sample Configuration for Iceberg AWS Glue Catalog and AWS S3 s3:// or s3a:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=glue
gg.eventhandler.iceberg.awsGlueId=<aws-acccount-id>
gg.eventhandler.iceberg.fileSystemScheme=s3://
#gg.eventhandler.iceberg.fileSystemScheme=s3a://
gg.eventhandler.iceberg.awsS3Region=us-east-2
gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket>
gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id>
gg.eventhandler.iceberg.awsSecretKey=<secret-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.3.4 Table Names and Case Sensitivity

AWS Glue catalog supports only lower case names.

AWS Glue catalog supports only two-part table names.

The target table in the GGDAA Replicat MAP statement should be mapped to the Glue database and table names.

Example: MAP QASOURCE.TCUSTMER, TARGET "glue_database"."tcustmer";

In this example, glue_database is the Glue database name and tcustmer is the Glue table name.

9.2.24.2.4 Configuration for Iceberg Polaris Catalog

Apache Polaris is an open-source, fully-featured catalog for Apache Iceberg.

There are a few options to setup Polaris:

This topic contains the following:

9.2.24.2.4.1 Polaris Common Configuration

The following are the configuration properties for the Polaris catalog:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Required String value. hadoop polaris.
gg.eventhandler.iceberg.polarisCatalogUri Required String value. None Polaris Catalog endpoint URI. Example: https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog.
gg.eventhandler.iceberg.polarisCatalogName Required String value. None Polaris Catalog name. Catalog name is the entry point to the Polaris catalog namespace and tables.
gg.eventhandler.iceberg.polarisClientId Required String value. None Polaris principal’s client ID used for authentication and authorization to the respective Polaris catalog.
gg.eventhandler.iceberg.polarisClientSecret Required String value. None Polaris principal’s client secret used for authentication and authorization to the respective Polaris catalog.
gg.eventhandler.iceberg.polarisPrincipalRole Optional String value. ALL The role to be assumed by the Polaris principal.
9.2.24.2.4.2 Polaris Catalog with Google Cloud Storage (GCS)
The environment variable GOOGLE_APPLICATION_CREDENTIALS must be set to the path to the Google Service account key file. Add the following to the Replicat parameter file (.prm):
SETENV (GOOGLE_APPLICATION_CREDENTIALS = "/path/to/the/gcp-service-account-json-key.json")
9.2.24.2.4.3 Polaris Catalog with AWS S3 Storage
Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.awsS3Region Required String value. None Required only if the Polaris catalog points to AWS S3 Storage. AWS S3 bucket region. Example: us-east-2.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// Required only if the Polaris catalog points to AWS S3 Storage. File system scheme to indicate AWS S3 object storage location: s3://.
gg.eventhandler.iceberg.awsAccessKeyId Optional String value. None Required only if the Polaris catalog points to AWS S3 Storage. AWS access key id for authentication.
gg.eventhandler.iceberg.awsSecretKey Optional String value. None Required only if the Polaris catalog points to AWS S3 Storage. AWS secret access key for authentication.
gg.eventhandler.iceberg.awsSessionToken Optional String value. None Required only if the Polaris catalog points to AWS S3 Storage. AWS session token for authentication.
gg.eventhandler.iceberg.awsS3Endpoint Optional String value. None Required only if the Polaris catalog points to AWS S3 Storage. AWS S3 endpoint.
9.2.24.2.4.4 Polaris Catalog with Azure Data Lake Storage (ADLS)
roperties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// Required only if the Polaris catalog points to Azure Data Lake Storage. Warehouse scheme to indicate Azure Data Lake Storage location: abfss://.
gg.eventhandler.iceberg.azureAccountName Required String value. None Required only if the Polaris catalog points to Azure Data Lake Storage. Azure storage account name that contains the container for the Iceberg Warehouse.
gg.eventhandler.iceberg.azureContainer Required String value. None Required only if the Polaris catalog points to Azure Data Lake Storage. Azure storage account container name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.azureAccountKey Required String value. None Required only if the Polaris catalog points to Azure Data Lake Storage. Azure storage account key.
gg.eventhandler.iceberg.azureBlobEndpoint Optional String value. <azureContainer>@<azureAccountName>.dfs.core.windows.net Required only if the Polaris catalog points to Azure Data Lake Storage. Azure Storage service endpoint.
9.2.24.2.4.5 Polaris Catalog and GCS Storage Classpath And Dependencies

If Polaris catalog is setup to write to GCS, then the Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
9.2.24.2.4.6 Polaris Catalog and AWS S3 storage Classpath and Dependencies

If Polaris catalog is setup to write to AWS S3, then the Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • AWS SDK dependencies for writing to AWS S3(s3://)
9.2.24.2.4.7 Polaris Catalog and ADLS storage Classpath And Dependencies

If Polaris catalog is setup to write to ADLS, then the Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Azure SDK dependencies for writing to Azure Data Lake Storage (abfss://).
9.2.24.2.4.8 Sample Configuration for Polaris Catalog
gg.target=iceberg
#For catalog using GCS
gg.classpath=DependencyDownloader/dependencies/iceberg-gcs-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/*
#For catalog using S3
#gg.classpath=DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/*
#For catalog using ADLS
#gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=polaris
gg.eventhandler.iceberg.polarisCatalogUri=https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog
gg.eventhandler.iceberg.polarisCatalogName=<polaris_gcs_catalog>
gg.eventhandler.iceberg.polarisClientId=<clientId>
gg.eventhandler.iceberg.polarisClientSecret=<clientSecret>
gg.eventhandler.iceberg.polarisPrincipalRole=ALL
9.2.24.2.4.9 Polaris Namespace

Polaris namespace is the top-level container for all the tables in the Polaris catalog.

Before starting the Replicat process, the Polaris namespace should be created in the respective Polaris catalog.

The Polaris namespace is mapped to the GoldenGate schema in the MAP statement.

Example: MAP QASOURCE.TCUSTMER, TARGET "polaris_namespace"."tcustmer";

9.2.24.2.5 Configuration for Iceberg REST Catalog

Iceberg defines a REST specification (https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) for catalog implementations.

Any REST server that implements the Iceberg REST API can be used as the Iceberg catalog.

For example, Polaris is an implementation of the Iceberg REST API.

9.2.24.2.5.1 Configuration for Iceberg REST Catalog

The following are the configuration properties for the Polaris catalog:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Required String value. hadoop rest.
gg.eventhandler.iceberg.restCatalogUri Required String value. None REST Catalog endpoint URI. Example: https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog.
gg.eventhandler.iceberg.restCatalogProperties Optional String value. None Properties file with additional configuration for the REST catalog.
9.2.24.2.5.2 Sample Configuration for REST Catalog based on Polaris
gg.target=iceberg
#For catalog using GCS
gg.classpath=DependencyDownloader/dependencies/iceberg-gcs-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/*
#For catalog using S3
#gg.classpath=DependencyDownloader/dependencies/iceberg-s3/*:DependencyDownloader/dependencies/iceberg-common/*
#For catalog using ADLS
#gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=rest
gg.eventhandler.iceberg.restCatalogUri=https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog
gg.eventhandler.iceberg.restCatalogProperties=/path/to/rest/catalog.properties
# Optional configuration for authentication to the object storage. 
# Some REST implementations do not require a separate authentication to the storage layer. 
#gg.eventhandler.iceberg.fileSystemScheme=s3://
#gg.eventhandler.iceberg.awsS3Region=<s3-region>
#gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket>
#gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id>
#gg.eventhandler.iceberg.awsSecretKey=<secret-key>
#gg.eventhandler.iceberg.fileSystemScheme=abfss://
#gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name>
#gg.eventhandler.iceberg.azureContainer=<azure-storage-container>
#gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key>
#gg.eventhandler.iceberg.fileSystemScheme=gs://
#gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket>
#gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id>
#gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file>
9.2.24.2.5.3 Sample Rest Catalog Properties file (For Polaris)
warehouse=polaris_s3_catalog
credential=<ClientId>:<ClientSecret>
scope=PRINCIPAL_ROLE:ALL
token-refresh-enabled=true

9.2.24.2.6 Configuration for Iceberg JDBC Catalog

Some JDBC compatible databases can be used to store the Iceberg catalog information.

Not all JDBC compatible databases are supported with the Iceberg JDBC Catalog API.

Note:

The Databricks target using the Databricks JDBC driver has been tested internally.
9.2.24.2.6.1 Configuration for Iceberg JDBC Catalog and file:// Scheme

The following are the configuration properties for the JDBC catalog and the local file system as the Iceberg storage using file:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop jdbc.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate local file system as the storage: file://.
gg.eventhandler.iceberg.warehouseLocation Required String value. None Local directory path to the Iceberg warehouse.
gg.eventhandler.iceberg.jdbcUrl Required String value. None JDBC URL to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcUser Optional String value. None JDBC user to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcPassword Optional String value. None JDBC password to connect to the database used as Iceberg catalog.
9.2.24.2.6.1.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Path to the JDBC driver to access the database used to store the Iceberg catalog.
9.2.24.2.6.1.2 Sample Configuration for Iceberg JDBC Catalog and Local File Storage file:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=/path/to/the/jdbc/driver/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=jdbc
gg.eventhandler.iceberg.jdbcUrl=<jdbc-url>
gg.eventhandler.iceberg.jdbcUser=<jdbc-user>
gg.eventhandler.iceberg.jdbcPassword=<jdbc-password>
9.2.24.2.6.2 Configuration for Iceberg JDBC Catalog and s3a:// Scheme

The following are the configuration properties for the JDBC catalog and AWS S3 object store using s3a:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop jdbc.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate AWS S3 object storage location: s3a://.
gg.eventhandler.iceberg.warehouseLocation Required String value. None Local directory path to the Iceberg warehouse.
gg.eventhandler.iceberg.jdbcUrl Required String value. None JDBC URL to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcUser Optional String value. None JDBC user to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcPassword Optional String value. None JDBC password to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.awsS3Bucket Required String value. None AWS S3 bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.awsAccessKeyId Required String value. None AWS access key id for authentication.
gg.eventhandler.iceberg.awsSecretKey Required String value. None AWS secret access key for authentication.
gg.eventhandler.iceberg.awsSessionToken Optional String value. None AWS session token for authentication.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the AWS S3 object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the AWS S3 object storage.
9.2.24.2.6.2.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop AWS SDK dependencies for writing to AWS S3 (s3a:// scheme)
  • Path to the JDBC driver to access the database used to store the Iceberg catalog.
9.2.24.2.6.2.2 Sample Configuration for JDBC Catalog and AWS S3 s3a:// scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-aws/*:DependencyDownloader/dependencies/iceberg-common/*:/path/to/the/jdbc/driver/*
gg.eventhandler.iceberg.catalogType=jdbc
gg.eventhandler.iceberg.jdbcUrl=<jdbc-url>
gg.eventhandler.iceberg.jdbcUser=<jdbc-user>
gg.eventhandler.iceberg.jdbcPassword=<jdbc-password>
gg.eventhandler.iceberg.fileSystemScheme=s3a://
gg.eventhandler.iceberg.awsS3Region=us-east-2
gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket>
gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id>
gg.eventhandler.iceberg.awsSecretKey=<secret-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.6.3 Configuration for Iceberg JDBC Catalog and gs:// Scheme

The following are the configuration properties for the JDBC catalog and GCS object store using gs:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop jdbc.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate GCS object storage location: gs://.
gg.eventhandler.iceberg.warehouseLocation Required String value. None Local directory path to the Iceberg warehouse.
gg.eventhandler.iceberg.jdbcUrl Required String value. None JDBC URL to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcUser Optional String value. None JDBC user to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcPassword Optional String value. None JDBC password to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.gcpStorageBucket Required String value. None Google Cloud Storage bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.gcpProjectId Required String value. None Sets the project-id of the Google Cloud project that houses the GCS bucket.
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile Required String value. None Sets the path to the Google Service account key file.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the GCS object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the GCS object storage.
9.2.24.2.6.3.1 Classpath And Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
  • Path to the JDBC driver to access the database used to store the Iceberg catalog.
9.2.24.2.6.3.2 Sample Configuration for JDBC Catalog and GCS gs:// scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:DependencyDownloader/dependencies/iceberg-common/*:/path/to/the/jdbc/driver/*
gg.eventhandler.iceberg.catalogType=jdbc
gg.eventhandler.iceberg.jdbcUrl=<jdbc-url>
gg.eventhandler.iceberg.jdbcUser=<jdbc-user>
gg.eventhandler.iceberg.jdbcPassword=<jdbc-password>
gg.eventhandler.iceberg.fileSystemScheme=gs://
gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket>
gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id>
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.6.4 Configuration for Iceberg JDBC Catalog and abfss:// Scheme

The following are the configuration properties for the JDBC catalog and Azure Data Lake Storage using the abfss:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop jdbc.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate Azure Data Lake Storage location: abfss://.
gg.eventhandler.iceberg.warehouseLocation Required String value. None Local directory path to the Iceberg warehouse.
gg.eventhandler.iceberg.jdbcUrl Required String value. None JDBC URL to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcUser Optional String value. None JDBC user to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.jdbcPassword Optional String value. None JDBC password to connect to the database used as Iceberg catalog.
gg.eventhandler.iceberg.azureAccountName Required String value. None Azure storage account name that contains the container for the Iceberg Warehouse.
gg.eventhandler.iceberg.azureContainer Required String value. None Azure storage account container name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.azureAccountKey Required String value. None Azure storage account key.
gg.eventhandler.iceberg.azureBlobEndpoint Optional String value. <azureContainer>@<azureAccountName>.dfs.core.windows.net Azure Storage service endpoint.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the Azure object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the Azure object storage.
9.2.24.2.6.4.1 Classpath And Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Azure SDK dependencies for writing to Azure Data Lake (ADLS)
  • Path to the JDBC driver to access the database used to store the Iceberg catalog.
9.2.24.2.6.4.2 Sample Configuration for JDBC Catalog and ADLS abfss:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/*:/path/to/the/jdbc/driver/*
gg.eventhandler.iceberg.catalogType=jdbc
gg.eventhandler.iceberg.jdbcUrl=<jdbc-url>
gg.eventhandler.iceberg.jdbcUser=<jdbc-user>
gg.eventhandler.iceberg.jdbcPassword=<jdbc-password>
gg.eventhandler.iceberg.fileSystemScheme=abfss://
gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name>
gg.eventhandler.iceberg.azureContainer=<azure-storage-container>
gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>

9.2.24.2.7 Configuration for Iceberg Hadoop Catalog

Hadoop catalog is not recommended for production usage as it has no reliable locking mechanism and would impact concurrent reads and writes.

Hadoop catalog is used for testing purposes only.

9.2.24.2.7.1 Configuration for Iceberg Hadoop Catalog and file:// Scheme

The following are the configuration properties for the Hadoop catalog and the local file system as the Iceberg storage using file:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop hadoop.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate local file system as the storage: file://.
gg.eventhandler.iceberg.warehouseLocation Required String value. None Local directory path to the Iceberg warehouse.

Note:

This configuration is typically used for testing purposes for storing the Iceberg tables on the local file system.
9.2.24.2.7.1.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
9.2.24.2.7.1.2 Sample Configuration for Iceberg Hadoop Catalog and Local File Storage file:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-common/*
9.2.24.2.7.2 Configuration for Iceberg Hadoop Catalog and s3a:// Scheme

The following are the configuration properties for the Hadoop catalog and AWS S3 object store using s3a:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop hadoop.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate AWS S3 object storage location: s3a://.
gg.eventhandler.iceberg.awsS3Bucket Required String value. None AWS S3 bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.awsAccessKeyId Required String value. None AWS access key id for authentication.
gg.eventhandler.iceberg.awsSecretKey Required String value. None AWS secret access key for authentication.
gg.eventhandler.iceberg.awsSessionToken Optional String value. None AWS session token for authentication.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the AWS S3 object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the AWS S3 object storage.
9.2.24.2.7.2.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop AWS SDK dependencies for writing to AWS S3 (s3a:// scheme)
9.2.24.2.7.2.2 Sample Configuration for Hadoop Catalog and AWS S3 s3a:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-aws/*:DependencyDownloader/dependencies/iceberg-common/
gg.eventhandler.iceberg.catalogType=hadoop
gg.eventhandler.iceberg.fileSystemScheme=s3a://
gg.eventhandler.iceberg.awsS3Region=us-east-2
gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket>
gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id>
gg.eventhandler.iceberg.awsSecretKey=<secret-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.7.3 Configuration for Iceberg Hadoop Catalog and gs:// Scheme

The following are the configuration properties for the Hadoop catalog and GCS object store using gs:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value. hadoop hadoop.
gg.eventhandler.iceberg.fileSystemScheme Optional String value. file:// File system scheme to indicate GCS object storage location: gs://.
gg.eventhandler.iceberg.gcpStorageBucket Required String value. None Google Cloud Storage bucket name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.gcpProjectId Required String value. None Sets the project-id of the Google Cloud project that houses the GCS bucket.
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile Required String value. None Sets the path to the Google Service account key file.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the GCS object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the GCS object storage.
9.2.24.2.7.3.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
9.2.24.2.7.3.2 Sample Configuration for Hadoop Catalog and GCS gs:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=hadoop
gg.eventhandler.iceberg.fileSystemScheme=gs://
gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket>
gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id>
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.7.4 Configuration for Iceberg Hadoop Catalog and abfss:// Scheme

The following are the configuration properties for the Hadoop catalog and Azure Data Lake Storage using abfss:// scheme:

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.iceberg.catalogType Optional String value hadoop hadoop.
gg.eventhandler.iceberg.fileSystemScheme Optional String value file:// File system scheme to indicate Azure Data Lake Storage location: abfss://.
gg.eventhandler.iceberg.azureAccountName Required String value None Azure storage account name that contains the container for the Iceberg Warehouse.
gg.eventhandler.iceberg.azureContainer Required String value None Azure storage account container name that houses the Iceberg Warehouse.
gg.eventhandler.iceberg.azureAccountKey Required String value. None Azure storage account key.
gg.eventhandler.iceberg.azureBlobEndpoint Optional String value. \ Azure Storage service endpoint.
gg.eventhandler.iceberg.proxyServer Optional String value. None Proxy server to connect to the Azure object storage.
gg.eventhandler.iceberg.proxyPort Optional String value. 80 Proxy server port to connect to the Azure object storage.
9.2.24.2.7.4.1 Classpath and Dependencies

The Java classpath (gg.classpath) should include the following dependencies:

  • Iceberg common dependencies
  • Hadoop Azure SDK dependencies for writing to Azure Data Lake (ADLS)
9.2.24.2.7.4.2 Sample Configuration for Hadoop Catalog and ADLS abfss:// Scheme
gg.target=iceberg
gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables
gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/*
gg.eventhandler.iceberg.catalogType=hadoop
gg.eventhandler.iceberg.fileSystemScheme=abfss://
gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name>
gg.eventhandler.iceberg.azureContainer=<azure-storage-container>
gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key>
gg.eventhandler.iceberg.proxyServer=<proxy-server>
gg.eventhandler.iceberg.proxyPort=<proxy-port>

9.2.24.3 Configuration Templates

Iceberg configuration templates are available in the directory /path/to/AdapterExamples/bigdata/iceberg.

The following template properties files are packaged with Oracle GoldenGate:

  • iceberg-glue-s3.properties
  • iceberg-hadoop-adls.properties
  • iceberg-hadoop-gcs.properties
  • iceberg-hadoop-localfile.properties
  • iceberg-hadoop-s3.properties
  • iceberg-jdbc-localfile.properties
  • iceberg-jdbc-s3.properties
  • iceberg-jdbc-adls.properties
  • iceberg-jdbc-gcs.properties
  • iceberg-nessie-adls.properties
  • iceberg-nessie-gcs.properties
  • iceberg-nessie-s3.properties
  • iceberg-nessie-s3a.properties
  • iceberg-polaris-adls.properties
  • iceberg-polaris-gcs.properties
  • iceberg-polaris-s3.properties
  • iceberg-rest.properties

9.2.24.4 Limitations

  • Oracle GoldenGate does not support configuration of partition columns during automatic table creation.

    If partitioned tables are required, the Iceberg table should be created manually with the required partition columns.

  • Altering the partitioning schema of a table is not supported after starting the Replication process.

    If the partitioning schema of a table needs to be changed, the table should be dropped and recreated manually in the target database.

    The data in the table will need to be reloaded.

    Note:

    Contact Oracle Support for assistance with this process.
  • Pre-existing Iceberg target tables must have identifier columns(key columns) in the schema.

    The Replicat process will ABEND if the target table does not have identifier columns.

  • The following Iceberg data types cannot be used as a key column (Iceberg identifier field):
    • binary
    • fixed
    • uuid

9.2.24.5 Instantiating Oracle GoldenGate with an Initial Load

For more information about the standard steps for instantiation, see: https://docs.oracle.com/en/middleware/goldengate/core/21.3/admin/instantiating-oracle-goldengate-initial-load.html#GUID-7D3BD34D-490B-4E76-A48B-63572D93881A

9.2.24.5.1 Instantiation Steps Specific to Iceberg

  1. Start initial load groups for Extract and Replicat.
  2. Start change synchronization group for Extract and write operations to a trail file.

    Note:

    Do not start change synchronization group for Replicat yet.
  3. Wait until the initial load Replicat group has completed apply of the initial load trail files.
  4. Stop the change synchronization group for Extract.
  5. Configure a change synchronization Replicat group.
  6. Add the parameter UPDATEINSERTS to the change synchronization Replicat group.
  7. Start the change synchronization Replicat group.
  8. Wait until the change synchronization Replicat group has processed all the trails generated by change synchronization Extract group.

    The last record’s end offset in the last trail file must match the targetCheckpoint value in the JSON checkpoint file of the change synchronization Replicat group.

    Example:
    • Run ls -l on the last trail file.
      -rw-r--r-- 1 username dba 5660 Feb 22 2024 /path/to/trail/tr000000003
    • Here the last record’s end offset is 5660, and the trail sequence is 3.
    • Open JSON checkpoint file for the change synchronization Replicat group
      This should have the following attribute:
       "targetCheckpoint" : {
           "trailSequence" : 3,
           "trailOffset" : 5660
        }
      This targetCheckpoint must match the last record’s end offset.
  9. Shutdown change synchronization Replicat group and remove the parameter UPDATEINSERTS.
  10. Initial load is complete now. Start change synchronization Extract and Replicat groups.

9.2.24.5.2 Iceberg Change Synchronization Replicat Behavior During Instantiation

  • Execute [DELETE+INSERT] for all the INSERT operations, irrespective of whether the base row exists on the target or not.
  • Run [DELETE+INSERT] for all the UPDATE operations, irrespective of whether the base row exists on the target or not.
  • Run DELETE for all the DELETE operations, irrespective of whether the base row exists on the target or not.

    Note:

    No collisions will be logged in the Iceberg Replicat report file.

9.2.24.6 Troubleshooting and Diagnostics

  • Oracle GoldenGate replicat supports the Iceberg data types as per the version 2 specification.
  • Iceberg identifier(key) fields cannot be null. Therefore, the Replicat process will ABEND if the key column value is null.
  • Schema changes to the table such as ADD/ALTER/DROP columns is not supported while Replicat process is running.

    There are steps to quiesce the replication process, apply the schema changes and resume the replication process.

    Note:

    Contact Oracle Support for assistance with this process.

    The Replicat process will ABEND if there are unmapped columns in the target table.

  • Replicat ABEND with the following message:
    ICEBERGEH-00060 Operation record at position  '00000000030000003318' for the table 'hadoop.oggdb1.types_tab' has  missing column values in an UPDATE. Replicat will
    ABEND. To override  this behavior set 'gg.eventhandler.iceberg.abendOnMissingColumns=false'and restart the Replicat process. Setting this property to false will  instruct Replicat to
    lookup missing columns from the target table and therefore may impact performance.
    By default, the Iceberg Replicat process expects trails files without missing column value in the UPDATE operations. Replicat can be configured to process compressed trails files with missing column values in the UPDATE operations by setting the property gg.eventhandler.iceberg.abendOnMissingColumns=false.
  • Replicat ABEND with the following message:
    ICEBERGEH-00057 Detected changes in the partition columns for the table 'hadoop.oggdb1.types_tab'. 
    Partition columns in the previous run: '<column list>', partition columns in this run: '<column list>'. 
    GoldenGate does not support changing partition columns. 
    Alter the table manually to match the partition columns in the previous run and restart the replicat process.
    The Iceberg Replicat process does not support changing partition columns.
  • Replicat ABEND with the following message:
    ICEBERGEH-00067 Invalid state. The column '<column_name>' in the target table '<table_name>' is not mapped. 
    The following are the mapped columns: '<column list>'. Iceberg Replicat requires all the columns in the target table to be mapped. 
    Please map the column ''<unmapped column>' and restart the Replicat process.
    The Iceberg Replicat process requires all the columns in the target table to be mapped.
  • Replicat ABEND with the following message:
    ICEBERGEH-00068 Key column '<column name>' in the table '<table name>' is of type float or double. 
    Iceberg does not support float or double type as identifier (key) fields. Initiating Replicat process shutdown. 
    Please modify the table schema to exclude double/float types as key columns and restart the Replicat process.
    As per the current Iceberg specification (version 2), the column types double and float cannot be used as identifier (key) columns.
  • Replicat ABEND with the following message:
    ICEBERGEH-00070 Table '<table_name>' contains a key column '<column_name>' of '<binary/fixed/uuid>' type that is not supported by GoldenGate. 
    The following column types are not supported as key: 'binary, fixed, uuid'. To proceed, either use a supported Iceberg key column type by altering the 'KEYCOLS' clause in the Replicat 'MAP' statement as per the following example: 'MAP <sourceSchema>.<sourceTable>, TARGET <targetSchema>.<targetTable>, KEYCOLS("key1", "key2");' or alter the Iceberg target tables's identifier fields to exclude the key column types that are not supported by GoldenGate. 
    You can use the following Iceberg SQL statement to alter the table schema: 'ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS key1, key2'.
    The Iceberg types binary, fixed and uuid cannot be used as identifier (key) columns.
  • Replicat ABEND with the following message:
    ICEBERGEH-00071=Table '<table_name>' does not define an  Iceberg identifier column. 
    Identifier columns are used as key columns by  GoldenGate. Initiating Replicat process shutdown. 
    Please alter the  Iceberg target tables's schema to add identifier columns. 
    You can use  the following Iceberg SQL statement to alter the table schema: 'ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS key1, key2'.
    The Iceberg target table should have identifier columns (key columns) in the schema.
  • Exceptions in the Replicat handler log file:
    • com.google.cloud.storage.StorageException: 401 Unauthorized
    • org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path
    • org.apache.iceberg.exceptions.RuntimeIOException: Failed to create file
    • org.apache.iceberg.exceptions.ForbiddenException: Forbidden

      These are common exceptions due to the incorrect configuration of the object storage authentication properties.

      Ensure that the following properties are set:

      • gg.eventhandler.iceberg.fileSystemScheme, gg.eventhandler.iceberg.proxyServer, gg.eventhandler.iceberg.proxyPort
      • gg.eventhandler.iceberg.awsAccessKeyId, gg.eventhandler.iceberg.awsSecretKey, gg.eventhandler.iceberg.awsS3Region
      • gg.eventhandler.iceberg.azureAccountKey
      • gg.eventhandler.iceberg.gcpProjectId, gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile.