9.2.24 Iceberg Event Handler
Iceberg is a high-performance table format for extremely large analytic tables. Iceberg brings the reliability and simplicity of SQL tables to GG for DAA, while making it possible for engines, such as Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time.
- Detailed Functionality
- Configuration
- Configuration Templates
- Limitations
- Instantiating Oracle GoldenGate with an Initial Load
- Troubleshooting and Diagnostics
Parent topic: Target
9.2.24.1 Detailed Functionality
The Oracle GoldenGate Iceberg Replicat can replicate GoldenGate trail records to Iceberg tables.
The Iceberg open-table-format files could be written to local files, AWS Simple Storage Service(S3), Google Cloud Storage(GCS), or Azure DataLake Storage(ADLS).
- Replication without a SQL Engine
- Iceberg File Format
- Iceberg Catalog
- Iceberg Specification
- Delete Files and Merge-On-Read (MoR)
- Operation Support
- Compressed Update Handling
- INSERTALLRECORDS Support
- Operation Aggregation
- Automatic Table Creation
- Iceberg Metadata Provider
- Iceberg Identifier Fields
- Primary Key Updates and Truncates
Parent topic: Iceberg Event Handler
9.2.24.1.1 Replication without a SQL Engine
Oracle GoldenGate Iceberg Replicat process does not require a SQL engine to replicate data to Iceberg tables.
It uses the Iceberg Java SDK along with object storage specific Java SDK to write data to Iceberg tables.
Parent topic: Detailed Functionality
9.2.24.1.2 Iceberg File Format
The default file format for Iceberg data files and delete files is Parquet.
- Parquet (default)
- Avro
- ORC
Parent topic: Detailed Functionality
9.2.24.1.3 Iceberg Catalog
Oracle GoldenGate supports the following Iceberg catalogs:
- Hadoop Catalog
- Nessie Catalog
- AWS Glue Catalog
- Polaris Catalog
- REST Catalog
- JDBC Catalog
Parent topic: Detailed Functionality
9.2.24.1.4 Iceberg Specification
Oracle GoldenGate generates data files and delete files as per the Iceberg specification version 2.
See https://iceberg.apache.org/spec/#version-2-row-level-deletes
Parent topic: Detailed Functionality
9.2.24.1.5 Delete Files and Merge-On-Read (MoR)
Oracle GoldenGate generates Iceberg delete files for the
UPDATE
and DELETE
operations.
Therefore, the Iceberg table property write.update.mode
is
always set to merge-on-read
.
SQL engines should support merge-on-read
to query tables
replicated by Oracle GoldenGate.
Iceberg supports two types of delete files:
- Equality Deletes: The deleted records are identified by the equality of the values in the columns specified in the delete file.
- Position Deletes: The deleted records are identified by the
position of the records in the Iceberg data file.
In the current release, Oracle GoldenGate uses Iceberg
Equality Deletes
to delete records from the Iceberg table.This allows records to be deleted without looking up the position of the rows in the Iceberg data file.
Note:
Contact Oracle support for use cases that require IcebergPosition Deletes
.
Parent topic: Detailed Functionality
9.2.24.1.6 Operation Support
The Iceberg event handler supports the following operations:
INSERT
: Generates Iceberg data files for the insert operations.UPDATE
: Generates Iceberg data files and delete files for update operations.DELETE
: Generates Iceberg delete files for delete operations.TRUNCATE
: Generates an Iceberg delete file with a condition as alwaystrue
to truncate the target table.This operation creates an empty Iceberg snapshot with no data files.
Parent topic: Detailed Functionality
9.2.24.1.7 Compressed Update Handling
A compressed update record in the Oracle GoldenGate trail file contains values for the key columns and the modified columns.
An uncompressed update record contains values for all the columns.
Oracle GoldenGate trails may contain compressed or uncompressed update records. The default extract configuration writes compressed updates to the trail files.
If there are missing column values in the update operations, then Replicat will ABEND.
This behavior can be overridden by setting the parameter
gg.eventhandler.iceberg.abendOnMissingColumns=false
in the Replicat
properties file.
When the parameter is set to false
, Replicat will handle
compressed updates by querying the previous values of the missing columns from the
Iceberg table.
9.2.24.1.7.1 Lookup Missing values in Sparse Updates
The lookup of the missing values is an expensive operation and may impact the performance of the Replicat process.
By default, Oracle GoldenGate writes records to Iceberg in micro batches every ten minutes.
Every micro-batch for a table can potentially contain millions of rows.
Micro batches will be processed for every target table in concurrent threads.
Therefore, it is critical that sufficient JVM heap memory is allocated to the Replicat process.
The lookup is performed only for such rows that contain at least one missing value in the update operation.
Oracle GoldenGate will automatically create target tables. During auto-creation of tables, Oracle GoldenGate Replicat will enable creation of Iceberg metrics (min/max values) for all the identifier (key) columns.
The metrics are stored in the Iceberg metadata files.
Iceberg metrics helps speed up the lookup of the missing values in the
UPDATE
operations.
Parent topic: Compressed Update Handling
9.2.24.1.8 INSERTALLRECORDS Support
Iceberg event handler supports INSERTALLRECORDS
parameter. See:
https://docs.oracle.com/en/middleware/goldengate/core/21.3/reference/insertallrecords.html#GUID-A1019C40-97BE-437B-9D80-7C99A9A6DB8E.
Set the INSERTALLRECORDS
parameter in the Replicat parameter file
(.prm
).
Setting this property directs the Replicat process to generate Iceberg data files to append operation data into the Iceberg target table.
Parent topic: Detailed Functionality
9.2.24.1.9 Operation Aggregation
Operation aggregation is the process of aggregating (merging/compressing) multiple operations on the same row into a single output operation based on a threshold.
Operation records are aggregated in-memory.
You can tune the frequency of apply interval using
gg.handler.iceberg.fileRollInterval
property, the default value is
set to 15m
(fifteen minutes).
The Replicat process will generate Iceberg data files and delete files for the aggregated operations.
Parent topic: Detailed Functionality
9.2.24.1.10 Automatic Table Creation
Oracle GoldenGate Replicat will automatically create target tables if the target table does not exist.
Parent topic: Detailed Functionality
9.2.24.1.11 Iceberg Metadata Provider
A new metadata provider for Iceberg is implemented to retrieve the Iceberg target table metadata.
Iceberg Metadata provider is auto configured and enabled by the Replicat process.
Parent topic: Detailed Functionality
9.2.24.1.12 Iceberg Identifier Fields
The identifier fields in the Iceberg table are used to uniquely identify the rows in the Iceberg table.
During the automatic table creation, Oracle GoldenGate maps the key columns to Iceberg identifier fields.
Note:
Iceberg tables without identifier fields are not supported in the current release.Parent topic: Detailed Functionality
9.2.24.1.13 Primary Key Updates and Truncates
- Primary key updates with missing column values will trigger files to be
flushed to the Iceberg table before the flush interval.
This can result in small data files and delete files for the primary key update operation.
For workloads or tables with frequent primary key updates, Oracle recommends to generate trail files with uncompressed update records.
Oracle also recommends to set
gg.validate.keyupdate=true
for trail generated from Oracle source.There is a known issue with Oracle extract to generate primary key update operations even though the key columns are not modified.
- A truncate operation will trigger files to be flushed to the Iceberg table before the flush interval.
Parent topic: Detailed Functionality
9.2.24.2 Configuration
The configuration of the Iceberg replication properties is stored in the Replicat properties file.
- Automatic Configuration
- Configuration for Iceberg Nessie Catalog
- Configuration for Iceberg AWS Glue Catalog
- Configuration for Iceberg Polaris Catalog
- Configuration for Iceberg REST Catalog
- Configuration for Iceberg JDBC Catalog
- Configuration for Iceberg Hadoop Catalog
Parent topic: Iceberg Event Handler
9.2.24.2.1 Automatic Configuration
Iceberg replication involves configuring multiple components, such as the File Writer Handler, and the target Iceberg Event Handler.
The Automatic Configuration functionality helps you to autoconfigure these components so that the manual configuration is minimal.
The properties modified by autoconfiguration is also logged in the handler log file.
To enable autoconfiguration to replicate to the Iceberg target, set the
parameter gg.target=iceberg
.
9.2.24.2.1.1 File Writer Configuration
The File Writer Handler name is pre-set to the value iceberg
and
its properties are automatically set to the required values for
Iceberg.
Parent topic: Automatic Configuration
9.2.24.2.1.2 Iceberg Event Handler Configuration
The Iceberg Event Handler name is pre-set to the value iceberg
.
This topic details the configuration properties available for the Iceberg Event handler, the required ones must be changed to match your Iceberg configuration.
- Common Iceberg Properties
- Iceberg Common Dependencies
- AWS Java SDK dependencies for Writing to AWS S3 (s3:// Scheme)
- Hadoop AWS SDK Dependencies for Writing to AWS S3 (s3a:// Scheme)
- Hadoop Google Cloud Storage SDK Dependencies for Writing to Google Cloud Storage (GCS)
- Google Cloud Storage SDK Dependencies for Writing to Google Cloud Storage (GCS)
- Hadoop Azure SDK Dependencies for Writing to Azure Data Lake (ADLS)
Parent topic: Automatic Configuration
9.2.24.2.1.2.1 Common Iceberg Properties
Iceberg can be configured to work with multiple catalogs and object stores.
The following are the common properties:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.warehouseLocation |
Optional | String value. | None | Directory path to the Iceberg warehouse location excluding the object
storage scheme. Example: /path/to/warehouse . This is a
required property when using the hadoop catalog. For
other Iceberg catalogs, warehouse location has a catalog specific
requirement.
|
gg.eventhandler.iceberg.fileRollInterval |
Optional | The default unit of measure is milliseconds. You can stipulate ms, s, m, h to signify milliseconds, seconds, minutes, or hours respectively. Examples of legal values include 10000, 10000ms, 10s, 10m, or 1.5h. Values of 0 or less indicate that file rolling on time is turned off. | 15m |
The parameter determines how often the data will be
pushed into the Iceberg warehouse. Use with caution, the higher this
value is the more data will need to be stored in the memory of the
Replicat process.
Note: Use the parameter with caution. Increasing its default value (15m ) will increase the amount of data stored
in the internal memory of the Replicat. This can cause out of memory
errors and stop the Replicat if it runs out of
memory.
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
Warehouse scheme to indicate the Iceberg object storage
location. Valid values are: file:// ,
gs:// , s3:// ,
s3a:// , abfss:// . For more
information, see File System Scheme.
|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
Iceberg catalog type. Valid values are: hadoop ,
jdbc , nessie ,
rest , glue ,
polaris .
|
gg.eventhandler.iceberg.fileFormat |
Optional | parquet, orc, or avro. | parquet |
Iceberg table file format to be used in target tables. Supported file formats: Parquet, Avro, and ORC. |
gg.eventhandler.iceberg.icebergTableProperties |
Optional | String value. | None | Path to a table properties file to specify additional Iceberg table properties to set to the target tables. |
gg.eventhandler.iceberg.abendOnMissingColumns |
Optional | true or false .
|
true |
When set to true and the
UPDATE operation contains a missing value, Replicat
will ABEND. When set to false , Replicat will not ABEND
if UPDATE operations have missing column values. The
missing columns values will be read by querying the target tables. This
lookup may impact the performance of the Replicat process.
|
gg.eventhandler.iceberg.abendOnSchemaChanges |
Optional | true or false |
true |
When set to true and schema changes are detected,
the replicat process will ABEND. User can manually update the target
schema and set the configuration to false to proceed.
When set to false , a warning message is logged for
schema changes.
|
gg.validate.keyupdate |
Optional |
true or false
|
false |
If set to true , Replicat will validate key update
operations (optype 115) and correct to normal update if no key values
have changed.
|
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.1.2.1.1 File System Scheme
The gg.eventhandler.iceberg.fileSystemScheme
property is
used to specify the object storage scheme.
The following are the supported object storage schemes:
file://
: Local file systemgs://
: Google Cloud Storages3://
: AWS S3s3a://
: AWS S3abfss://
: Azure Data Lake Storage
Parent topic: Common Iceberg Properties
9.2.24.2.1.2.2 Iceberg Common Dependencies
The following are the common Iceberg dependencies:
<dependencies> <!-- Common Iceberg dependencies START --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.4.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>3.4.0</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-arrow</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-core</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-data</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-parquet</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-gcp</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-aws</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-orc</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-nessie</artifactId> <version>1.6.1</version> </dependency> <!-- Common Iceberg dependencies END --> </dependencies>
You can download the dependencies from maven central using the script
download_dependencies.sh
in the
DependencyDownloader
directory.
Follow these steps:
- Change directory to
DependencyDownloader
. - Edit
config_proxy.sh
if proxy configuration is required. - Run the script:
This script will download the dependencies and store them in the./download_dependencies.sh xmls/iceberg-common.xml
iceberg-common
directory.gg.classpath
can be configured to include the dependencies from theiceberg-common
directory as follows:gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-common/*
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.1.2.3 AWS Java SDK dependencies for Writing to AWS S3 (s3:// Scheme)
The following are the Iceberg dependencies to write to AWS S3 using the
s3://
scheme:
<dependencies> <!-- s3:// scheme dependencies START --> <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>s3</artifactId> <version>2.28.6</version> </dependency> <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>sts</artifactId> <version>2.28.6</version> </dependency> <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>glue</artifactId> <version>2.28.6</version> </dependency> <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>url-connection-client</artifactId> <version>2.28.6</version> </dependency> <!-- s3:// scheme dependencies END --> </dependencies>
The dependencies can be downloaded from maven central using the script
download_dependencies.sh
in the
DependencyDownloader
directory.
Follow these steps:
- Change directory to
DependencyDownloader
. - Edit
config_proxy.sh
if proxy configuration is required. - Run the script:
./download_dependencies.sh xmls/iceberg-aws-java-sdk.xml
This script will download the dependencies and store them in the
iceberg-aws-java-sdk
directory.
gg.classpath
: can be configured to include the dependencies as
follows:
gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.1.2.4 Hadoop AWS SDK Dependencies for Writing to AWS S3 (s3a:// Scheme)
s3a://
scheme
:<dependencies> <!-- s3a:// scheme dependencies START --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-aws</artifactId> <version>3.4.0</version> </dependency> <!-- s3a:// scheme dependencies END --> </dependencies>
You can download the dependencies from maven central using the script
download_dependencies.sh
in the
DependencyDownloader
directory.
Follow these steps:
- Change directory to
DependencyDownloader
. - Edit
config_proxy.sh
if proxy configuration is required. - Run the script:
./download_dependencies.sh xmls/iceberg-hadoop-aws.xml
This script will download the dependencies and store them in the
iceberg-hadoop-aws
directory.
gg.classpath
can be configured to include the dependencies
as follows:
gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-aws/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.1.2.5 Hadoop Google Cloud Storage SDK Dependencies for Writing to Google Cloud Storage (GCS)
<dependencies> <!-- gs:// scheme dependencies START --> <dependency> <groupId>com.google.cloud.bigdataoss</groupId> <artifactId>gcs-connector</artifactId> <version>hadoop3-2.2.22</version> </dependency> <!-- gs:// scheme dependencies END --> </dependencies>
The dependencies can be downloaded from maven central using the script
download_dependencies.sh
in the
DependencyDownloader
directory.
Follow these steps:
- Change directory to
DependencyDownloader
. - Edit
config_proxy.sh
if proxy configuration is required. - Run the script:
./download_dependencies.sh xmls/iceberg-hadoop-gcs.xml
This script will download the dependencies and store them in the
iceberg-hadoop-gcs
directory.
gg.classpath
can be configured to include the dependencies
as follows:
g.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.1.2.6 Google Cloud Storage SDK Dependencies for Writing to Google Cloud Storage (GCS)
<dependencies> <dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-storage</artifactId> <version>2.37.0</version> </dependency> </dependencies>
The dependencies can be downloaded from maven central using the script
download_dependencies.sh
in the
DependencyDownloader
directory.
Follow these steps:
- Change directory to
DependencyDownloader
. - Edit
config_proxy.sh
if proxy configuration is required. - Run the script:
./download_dependencies.sh xmls/iceberg-gcs-java-sdk.xml
This script will download the dependencies and store them in the
iceberg-gcs-java-sdk
directory.
gg.classpath
can be configured to include the dependencies
as
follows:gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.1.2.7 Hadoop Azure SDK Dependencies for Writing to Azure Data Lake (ADLS)
<dependencies> <!-- abfss:// scheme dependencies START --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-azure</artifactId> <version>3.4.0</version> </dependency> <!-- abfss:// scheme dependencies END --> </dependencies>
The dependencies can be downloaded from maven central using the script
download_dependencies.sh
in the
DependencyDownloader
directory.
Follow these steps:
- Change directory to
DependencyDownloader
. - Edit
config_proxy.sh
if proxy configuration is required. - Run the script:
./download_dependencies.sh xmls/iceberg-hadoop-azure.xml
This script will download the dependencies and store them in the
iceberg-hadoop-azure
directory.
gg.classpath
: can be configured to include the dependencies as
follows:
gg.classpath=/path/to/DependencyDownloader/dependencies/iceberg-hadoop-azure/*:/path/to/DependencyDownloader/dependencies/iceberg-common/*
Parent topic: Iceberg Event Handler Configuration
9.2.24.2.2 Configuration for Iceberg Nessie Catalog
- Configuration for Nessie Catalog and AWS S3 s3:// Scheme
- Configuration for Nessie Catalog and AWS S3 s3a:// Scheme
- Configuration for Nessie Catalog and GCS gs:// Scheme
- Configuration for Nessie Catalog and Azure Data Lake Storage abfss:// Scheme
Parent topic: Configuration
9.2.24.2.2.1 Configuration for Nessie Catalog and AWS S3 s3:// Scheme
The following are the configuration properties for the Nessie catalog and AWS S3
object store using s3://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
nessie |
gg.eventhandler.iceberg.nessieBranch |
Optional | String value. | main |
Nessie Catalog branch name where the Iceberg table metadata exists. |
gg.eventhandler.iceberg.nessieUri |
Required | String value. | None | Nessie Catalog endpoint URI. Example:
. |
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// |
File system scheme to indicate AWS S3 object storage
location: s3:// .
|
gg.eventhandler.iceberg.awsS3Region |
Required | String value. | None | AWS S3 bucket region. Example: us-east-2 .
|
gg.eventhandler.iceberg.awsS3Bucket |
Required | String value. | None | AWS S3 bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.awsAccessKeyId |
Optional | String value. | None | AWS access key id for authentication. |
gg.eventhandler.iceberg.awsSecretKey |
Optional | String value. | None | AWS secret access key for authentication. |
gg.eventhandler.iceberg.awsSessionToken |
Optional | String value. | None | AWS session token for authentication. |
gg.eventhandler.iceberg.awsRoleArn |
Optional | String value. | None | AWS role ARN for authentication. |
gg.eventhandler.iceberg.awsS3Endpoint |
Optional | String value. | None | AWS S3 endpoint. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the AWS S3 object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the AWS S3 object storage. |
Parent topic: Configuration for Iceberg Nessie Catalog
9.2.24.2.2.1.1 Classpath And Dependencies
The Java classpath (gg.classpath
) should include the
following dependencies:
- Iceberg common dependencies
- AWS SDK dependencies for writing to AWS S3 (
s3://
scheme)
Parent topic: Configuration for Nessie Catalog and AWS S3 s3:// Scheme
9.2.24.2.2.1.2 Sample Configuration for Nessie Catalog and AWS S3 s3:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=nessie gg.eventhandler.iceberg.nessieBranch=main gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2 gg.eventhandler.iceberg.fileSystemScheme=s3:// gg.eventhandler.iceberg.awsS3Region=us-east-2 gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket> gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id> gg.eventhandler.iceberg.awsSecretKey=<secret-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
Parent topic: Configuration for Nessie Catalog and AWS S3 s3:// Scheme
9.2.24.2.2.2 Configuration for Nessie Catalog and AWS S3 s3a:// Scheme
The following are the configuration properties for the Nessie catalog and AWS S3
object store using s3a://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
nessie .
|
gg.eventhandler.iceberg.nessieBranch |
Optional | String value. | main |
Nessie Catalog branch name where the Iceberg table metadata exists. |
gg.eventhandler.iceberg.nessieUri |
Required | String value. | None | Nessie Catalog endpoint URI. Example:
http://<nessie-server>.com:10001/api/v2 .
|
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// |
File system scheme to indicate AWS S3 object storage
location: s3a:// .
|
gg.eventhandler.iceberg.awsS3Bucket |
Required | String value. | None | AWS S3 bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.awsAccessKeyId |
Required | String value. | None | AWS access key id for authentication. |
gg.eventhandler.iceberg.awsSecretKey |
Required | String value. | None | AWS secret access key for authentication. |
gg.eventhandler.iceberg.awsSessionToken |
Optional | String value. | None | AWS session token for authentication. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the AWS S3 object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the AWS S3 object storage. |
Parent topic: Configuration for Iceberg Nessie Catalog
9.2.24.2.2.2.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop AWS SDK dependencies for writing to AWS S3 (
s3a://
scheme)
9.2.24.2.2.2.2 Sample Configuration for Nessie Catalog and AWS S3 s3a:// scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-aws/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=nessie gg.eventhandler.iceberg.nessieBranch=main gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2 gg.eventhandler.iceberg.fileSystemScheme=s3a:// gg.eventhandler.iceberg.awsS3Region=us-east-2 gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket> gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id> gg.eventhandler.iceberg.awsSecretKey=<secret-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.2.3 Configuration for Nessie Catalog and GCS gs:// Scheme
The following are the configuration properties for the Nessie catalog and GCS object
store using gs://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
nessie .
|
gg.eventhandler.iceberg.nessieBranch |
Optional | String value. | main |
Nessie Catalog branch name where the Iceberg table metadata exists. |
gg.eventhandler.iceberg.nessieUri |
Required | String value. | None | Nessie Catalog endpoint URI. Example:
http://<nessie-server>.com:10001/api/v2 .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
File system scheme to indicate GCS object storage
location: gs:// .
|
gg.eventhandler.iceberg.gcpStorageBucket |
Required | String value. | None | Google Cloud Storage bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.gcpProjectId |
Required | String value. | None | Sets the project-id of the Google Cloud project that houses the GCS bucket. |
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile |
Required | String value. | None | Sets the path to the Google Service account key file. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the GCS object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the GCS object storage. |
Parent topic: Configuration for Iceberg Nessie Catalog
9.2.24.2.2.3.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
Parent topic: Configuration for Nessie Catalog and GCS gs:// Scheme
9.2.24.2.2.3.2 Sample Configuration for Nessie Catalog and GCS gs:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=nessie gg.eventhandler.iceberg.nessieBranch=main gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2 gg.eventhandler.iceberg.fileSystemScheme=gs:// gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket> gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id> gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
Parent topic: Configuration for Nessie Catalog and GCS gs:// Scheme
9.2.24.2.2.4 Configuration for Nessie Catalog and Azure Data Lake Storage abfss:// Scheme
The following are the configuration properties for the Nessie catalog and Azure Data
Lake Storage using abfss://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
nessie .
|
gg.eventhandler.iceberg.nessieBranch |
Optional | String value. | main |
Nessie Catalog branch name where the Iceberg table metadata exists. |
gg.eventhandler.iceberg.nessieUri |
Required | String value. | None | Nessie Catalog endpoint URI. Example:
http://<nessie-server>.com:10001/api/v2 .
|
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. |
file:// |
File system scheme to indicate Azure Data Lake Storage
location: abfss:// .
|
gg.eventhandler.iceberg.azureAccountName |
Required | String value. | None | Azure storage account name that contains the container for the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureContainer |
Required | String value. | None | Azure storage account container name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureAccountKey |
Required | String value. | None | Azure storage account key. |
gg.eventhandler.iceberg.azureBlobEndpoint |
Optional | String value. | <azureContainer>@<azureAccountName>.dfs.core.windows.net |
Azure Storage service endpoint. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the Azure object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the Azure object storage. |
- Classpath and Dependencies
- Sample Configuration for Nessie Catalog and ADLS abfss:// Scheme
- Nessie Namespace
Parent topic: Configuration for Iceberg Nessie Catalog
9.2.24.2.2.4.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop Azure SDK dependencies for writing to Azure Data Lake (ADLS)
9.2.24.2.2.4.2 Sample Configuration for Nessie Catalog and ADLS abfss:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=nessie gg.eventhandler.iceberg.nessieBranch=main gg.eventhandler.iceberg.nessieUri=http://<nessie-server>:10001/api/v2 gg.eventhandler.iceberg.fileSystemScheme=abfss:// gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name> gg.eventhandler.iceberg.azureContainer=<azure-storage-container> gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.2.4.3 Nessie Namespace
Nessie namespace is the top-level container for all the tables in the Nessie catalog.
Before starting the Replicat process, it is required to have existing namespaces before creating or writing to tables.
Nessie namespace can be created using the nessie command line program
(nessie-cli-<version>.jar
) as follows: create namespace
QASOURCE;
The Nessie namespace is mapped to the GoldenGate schema in the MAP statement.
For example: MAP QASOURCE.TCUSTMER, TARGET QASOURCE.TCUSTMER;
9.2.24.2.3 Configuration for Iceberg AWS Glue Catalog
- Configuration for Iceberg AWS Glue Catalog and AWS S3 s3:// OR s3a:// Scheme
- Classpath and Dependencies
- Sample Configuration for Iceberg AWS Glue Catalog and AWS S3 s3:// or s3a:// Scheme
- Table Names and Case Sensitivity
Parent topic: Configuration
9.2.24.2.3.1 Configuration for Iceberg AWS Glue Catalog and AWS S3 s3:// OR s3a:// Scheme
The following are the configuration properties for the AWS Glue catalog and AWS S3
object store using s3://
or s3a://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop | glue. |
gg.eventhandler.iceberg.awsGlueId |
Required | String value. | None | The Glue catalog ID is your numeric AWS account ID. |
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// | File system scheme to indicate AWS S3 object storage
location: s3:// or s3a:// .
|
gg.eventhandler.iceberg.awsS3Region |
Required | String value. | None | AWS S3 bucket region. Example: us-east-2. |
gg.eventhandler.iceberg.awsS3Bucket |
Required | String value. | None | AWS S3 bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.awsAccessKeyId |
Optional | String value. | None | AWS access key id for authentication. |
gg.eventhandler.iceberg.awsSecretKey |
Optional | String value. | None | AWS secret access key for authentication. |
gg.eventhandler.iceberg.awsSessionToken |
Optional | String value. | None | AWS session token for authentication. |
gg.eventhandler.iceberg.awsRoleArn |
Optional | String value. | None | AWS role ARN for authentication. |
gg.eventhandler.iceberg.awsS3Endpoint |
Optional | String value. | None | AWS S3 endpoint. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the AWS S3 object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String Value. | 80 |
Proxy server port to connect to the AWS S3 object storage. |
Parent topic: Configuration for Iceberg AWS Glue Catalog
9.2.24.2.3.2 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- AWS SDK dependencies for writing to AWS S3 (
s3://
)
Parent topic: Configuration for Iceberg AWS Glue Catalog
9.2.24.2.3.3 Sample Configuration for Iceberg AWS Glue Catalog and AWS S3 s3:// or s3a:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=glue gg.eventhandler.iceberg.awsGlueId=<aws-acccount-id> gg.eventhandler.iceberg.fileSystemScheme=s3:// #gg.eventhandler.iceberg.fileSystemScheme=s3a:// gg.eventhandler.iceberg.awsS3Region=us-east-2 gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket> gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id> gg.eventhandler.iceberg.awsSecretKey=<secret-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
Parent topic: Configuration for Iceberg AWS Glue Catalog
9.2.24.2.3.4 Table Names and Case Sensitivity
AWS Glue catalog supports only lower case names.
AWS Glue catalog supports only two-part table names.
The target table in the GGDAA Replicat MAP
statement should
be mapped to the Glue database and table names.
Example: MAP QASOURCE.TCUSTMER, TARGET "glue_database"."tcustmer";
In this example, glue_database
is the Glue database name
and tcustmer
is the Glue table name.
Parent topic: Configuration for Iceberg AWS Glue Catalog
9.2.24.2.4 Configuration for Iceberg Polaris Catalog
Apache Polaris is an open-source, fully-featured catalog for Apache Iceberg.
There are a few options to setup Polaris:
- Snowflake hosted Polaris (https://other-docs.snowflake.com/en/opencatalog/overview).
-
Polaris on your own infrastructure (https://polaris.apache.org/in-dev/unreleased/quickstart/).
Polaris catalog setup includes configuration and authentication to the object stores (S3/GCS/ADLS).
Iceberg warehouse location and authentication to object stores is not setup by GoldenGate when using Polaris.
This topic contains the following:
- Polaris Common Configuration
- Polaris Catalog with Google Cloud Storage (GCS)
- Polaris Catalog with AWS S3 Storage
- Polaris Catalog with Azure Data Lake Storage (ADLS)
- Polaris Catalog and GCS Storage Classpath And Dependencies
- Polaris Catalog and AWS S3 storage Classpath and Dependencies
- Polaris Catalog and ADLS storage Classpath And Dependencies
- Sample Configuration for Polaris Catalog
- Polaris Namespace
Parent topic: Configuration
9.2.24.2.4.1 Polaris Common Configuration
The following are the configuration properties for the Polaris catalog:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Required | String value. | hadoop |
polaris .
|
gg.eventhandler.iceberg.polarisCatalogUri |
Required | String value. | None | Polaris Catalog endpoint URI. Example:
https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog. |
gg.eventhandler.iceberg.polarisCatalogName |
Required | String value. | None | Polaris Catalog name. Catalog name is the entry point to the Polaris catalog namespace and tables. |
gg.eventhandler.iceberg.polarisClientId |
Required | String value. | None | Polaris principal’s client ID used for authentication and authorization to the respective Polaris catalog. |
gg.eventhandler.iceberg.polarisClientSecret |
Required | String value. | None | Polaris principal’s client secret used for authentication and authorization to the respective Polaris catalog. |
gg.eventhandler.iceberg.polarisPrincipalRole |
Optional | String value. | ALL |
The role to be assumed by the Polaris principal. |
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.2 Polaris Catalog with Google Cloud Storage (GCS)
GOOGLE_APPLICATION_CREDENTIALS
must be set
to the path to the Google Service account key file. Add the following to the
Replicat parameter file
(.prm
):SETENV (GOOGLE_APPLICATION_CREDENTIALS = "/path/to/the/gcp-service-account-json-key.json")
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.3 Polaris Catalog with AWS S3 Storage
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.awsS3Region |
Required | String value. | None | Required only if the Polaris catalog points to AWS S3 Storage. AWS S3
bucket region. Example: us-east-2 .
|
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// |
Required only if the Polaris catalog points to AWS S3
Storage. File system scheme to indicate AWS S3 object storage location:
s3:// .
|
gg.eventhandler.iceberg.awsAccessKeyId |
Optional | String value. | None | Required only if the Polaris catalog points to AWS S3 Storage. AWS access key id for authentication. |
gg.eventhandler.iceberg.awsSecretKey |
Optional | String value. | None | Required only if the Polaris catalog points to AWS S3 Storage. AWS secret access key for authentication. |
gg.eventhandler.iceberg.awsSessionToken |
Optional | String value. | None | Required only if the Polaris catalog points to AWS S3 Storage. AWS session token for authentication. |
gg.eventhandler.iceberg.awsS3Endpoint |
Optional | String value. | None | Required only if the Polaris catalog points to AWS S3 Storage. AWS S3 endpoint. |
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.4 Polaris Catalog with Azure Data Lake Storage (ADLS)
roperties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// |
Required only if the Polaris catalog points to Azure Data Lake
Storage. Warehouse scheme to indicate Azure Data Lake Storage location:
abfss:// .
|
gg.eventhandler.iceberg.azureAccountName |
Required | String value. | None | Required only if the Polaris catalog points to Azure Data Lake Storage. Azure storage account name that contains the container for the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureContainer |
Required | String value. | None | Required only if the Polaris catalog points to Azure Data Lake Storage. Azure storage account container name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureAccountKey |
Required | String value. | None | Required only if the Polaris catalog points to Azure Data Lake Storage. Azure storage account key. |
gg.eventhandler.iceberg.azureBlobEndpoint |
Optional | String value. | <azureContainer>@<azureAccountName>.dfs.core.windows.net |
Required only if the Polaris catalog points to Azure Data Lake Storage. Azure Storage service endpoint. |
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.5 Polaris Catalog and GCS Storage Classpath And Dependencies
If Polaris catalog is setup to write to GCS, then the Java classpath
(gg.classpath
) should include the following dependencies:
- Iceberg common dependencies
- Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.6 Polaris Catalog and AWS S3 storage Classpath and Dependencies
If Polaris catalog is setup to write to AWS S3, then the Java classpath
(gg.classpath
) should include the following dependencies:
- Iceberg common dependencies
- AWS SDK dependencies for writing to AWS S3(
s3://
)
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.7 Polaris Catalog and ADLS storage Classpath And Dependencies
If Polaris catalog is setup to write to ADLS, then the Java classpath
(gg.classpath
) should include the following dependencies:
- Iceberg common dependencies
- Hadoop Azure SDK dependencies for writing to Azure Data Lake Storage
(
abfss://
).
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.8 Sample Configuration for Polaris Catalog
gg.target=iceberg #For catalog using GCS gg.classpath=DependencyDownloader/dependencies/iceberg-gcs-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/* #For catalog using S3 #gg.classpath=DependencyDownloader/dependencies/iceberg-aws-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/* #For catalog using ADLS #gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=polaris gg.eventhandler.iceberg.polarisCatalogUri=https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog gg.eventhandler.iceberg.polarisCatalogName=<polaris_gcs_catalog> gg.eventhandler.iceberg.polarisClientId=<clientId> gg.eventhandler.iceberg.polarisClientSecret=<clientSecret> gg.eventhandler.iceberg.polarisPrincipalRole=ALL
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.4.9 Polaris Namespace
Polaris namespace is the top-level container for all the tables in the Polaris catalog.
Before starting the Replicat process, the Polaris namespace should be created in the respective Polaris catalog.
The Polaris namespace is mapped to the GoldenGate schema in the MAP statement.
Example: MAP QASOURCE.TCUSTMER, TARGET
"polaris_namespace"."tcustmer";
Parent topic: Configuration for Iceberg Polaris Catalog
9.2.24.2.5 Configuration for Iceberg REST Catalog
Iceberg defines a REST specification (https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) for catalog implementations.
Any REST server that implements the Iceberg REST API can be used as the Iceberg catalog.
For example, Polaris is an implementation of the Iceberg REST API.
- Configuration for Iceberg REST Catalog
- Sample Configuration for REST Catalog based on Polaris
- Sample Rest Catalog Properties file (For Polaris)
Parent topic: Configuration
9.2.24.2.5.1 Configuration for Iceberg REST Catalog
The following are the configuration properties for the Polaris catalog:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Required | String value. | hadoop |
rest .
|
gg.eventhandler.iceberg.restCatalogUri |
Required | String value. | None | REST Catalog endpoint URI. Example:
https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog. |
gg.eventhandler.iceberg.restCatalogProperties |
Optional | String value. | None | Properties file with additional configuration for the REST catalog. |
Parent topic: Configuration for Iceberg REST Catalog
9.2.24.2.5.2 Sample Configuration for REST Catalog based on Polaris
gg.target=iceberg #For catalog using GCS gg.classpath=DependencyDownloader/dependencies/iceberg-gcs-java-sdk/*:DependencyDownloader/dependencies/iceberg-common/* #For catalog using S3 #gg.classpath=DependencyDownloader/dependencies/iceberg-s3/*:DependencyDownloader/dependencies/iceberg-common/* #For catalog using ADLS #gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=rest gg.eventhandler.iceberg.restCatalogUri=https://<polaris-account>.snowflakecomputing.com/polaris/api/catalog gg.eventhandler.iceberg.restCatalogProperties=/path/to/rest/catalog.properties # Optional configuration for authentication to the object storage. # Some REST implementations do not require a separate authentication to the storage layer. #gg.eventhandler.iceberg.fileSystemScheme=s3:// #gg.eventhandler.iceberg.awsS3Region=<s3-region> #gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket> #gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id> #gg.eventhandler.iceberg.awsSecretKey=<secret-key> #gg.eventhandler.iceberg.fileSystemScheme=abfss:// #gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name> #gg.eventhandler.iceberg.azureContainer=<azure-storage-container> #gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key> #gg.eventhandler.iceberg.fileSystemScheme=gs:// #gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket> #gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id> #gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file>
Parent topic: Configuration for Iceberg REST Catalog
9.2.24.2.5.3 Sample Rest Catalog Properties file (For Polaris)
warehouse=polaris_s3_catalog credential=<ClientId>:<ClientSecret> scope=PRINCIPAL_ROLE:ALL token-refresh-enabled=true
Parent topic: Configuration for Iceberg REST Catalog
9.2.24.2.6 Configuration for Iceberg JDBC Catalog
Some JDBC compatible databases can be used to store the Iceberg catalog information.
Not all JDBC compatible databases are supported with the Iceberg JDBC Catalog API.
Note:
The Databricks target using the Databricks JDBC driver has been tested internally.- Configuration for Iceberg JDBC Catalog and file:// Scheme
- Configuration for Iceberg JDBC Catalog and s3a:// Scheme
- Configuration for Iceberg JDBC Catalog and gs:// Scheme
- Configuration for Iceberg JDBC Catalog and abfss:// Scheme
Parent topic: Configuration
9.2.24.2.6.1 Configuration for Iceberg JDBC Catalog and file:// Scheme
The following are the configuration properties for the JDBC catalog and the local file
system as the Iceberg storage using file://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
jdbc .
|
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// |
File system scheme to indicate local file system as the
storage: file:// .
|
gg.eventhandler.iceberg.warehouseLocation |
Required | String value. | None | Local directory path to the Iceberg warehouse. |
gg.eventhandler.iceberg.jdbcUrl |
Required | String value. | None | JDBC URL to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcUser |
Optional | String value. | None | JDBC user to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcPassword |
Optional | String value. | None | JDBC password to connect to the database used as Iceberg catalog. |
- Classpath and Dependencies
- Sample Configuration for Iceberg JDBC Catalog and Local File Storage file:// Scheme
Parent topic: Configuration for Iceberg JDBC Catalog
9.2.24.2.6.1.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Path to the JDBC driver to access the database used to store the Iceberg catalog.
9.2.24.2.6.1.2 Sample Configuration for Iceberg JDBC Catalog and Local File Storage file:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=/path/to/the/jdbc/driver/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=jdbc gg.eventhandler.iceberg.jdbcUrl=<jdbc-url> gg.eventhandler.iceberg.jdbcUser=<jdbc-user> gg.eventhandler.iceberg.jdbcPassword=<jdbc-password>
9.2.24.2.6.2 Configuration for Iceberg JDBC Catalog and s3a:// Scheme
The following are the configuration properties for the JDBC catalog and AWS S3
object store using s3a://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
jdbc .
|
gg.eventhandler.iceberg.fileSystemScheme
|
Optional | String value. | file:// |
File system scheme to indicate AWS S3 object storage
location: s3a:// .
|
gg.eventhandler.iceberg.warehouseLocation |
Required | String value. | None | Local directory path to the Iceberg warehouse. |
gg.eventhandler.iceberg.jdbcUrl |
Required | String value. | None | JDBC URL to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcUser |
Optional | String value. | None | JDBC user to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcPassword |
Optional | String value. | None | JDBC password to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.awsS3Bucket |
Required | String value. | None | AWS S3 bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.awsAccessKeyId |
Required | String value. | None | AWS access key id for authentication. |
gg.eventhandler.iceberg.awsSecretKey |
Required | String value. | None | AWS secret access key for authentication. |
gg.eventhandler.iceberg.awsSessionToken |
Optional | String value. | None | AWS session token for authentication. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the AWS S3 object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the AWS S3 object storage. |
Parent topic: Configuration for Iceberg JDBC Catalog
9.2.24.2.6.2.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop AWS SDK dependencies for writing to AWS S3 (
s3a://
scheme) - Path to the JDBC driver to access the database used to store the Iceberg catalog.
Parent topic: Configuration for Iceberg JDBC Catalog and s3a:// Scheme
9.2.24.2.6.2.2 Sample Configuration for JDBC Catalog and AWS S3 s3a:// scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-aws/*:DependencyDownloader/dependencies/iceberg-common/*:/path/to/the/jdbc/driver/* gg.eventhandler.iceberg.catalogType=jdbc gg.eventhandler.iceberg.jdbcUrl=<jdbc-url> gg.eventhandler.iceberg.jdbcUser=<jdbc-user> gg.eventhandler.iceberg.jdbcPassword=<jdbc-password> gg.eventhandler.iceberg.fileSystemScheme=s3a:// gg.eventhandler.iceberg.awsS3Region=us-east-2 gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket> gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id> gg.eventhandler.iceberg.awsSecretKey=<secret-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
Parent topic: Configuration for Iceberg JDBC Catalog and s3a:// Scheme
9.2.24.2.6.3 Configuration for Iceberg JDBC Catalog and gs:// Scheme
The following are the configuration properties for the JDBC catalog and GCS object store
using gs://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
jdbc .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
File system scheme to indicate GCS object storage
location: gs:// .
|
gg.eventhandler.iceberg.warehouseLocation |
Required | String value. | None | Local directory path to the Iceberg warehouse. |
gg.eventhandler.iceberg.jdbcUrl |
Required | String value. | None | JDBC URL to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcUser |
Optional | String value. | None | JDBC user to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcPassword |
Optional | String value. | None | JDBC password to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.gcpStorageBucket |
Required | String value. | None | Google Cloud Storage bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.gcpProjectId |
Required | String value. | None | Sets the project-id of the Google Cloud project that houses the GCS bucket. |
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile |
Required | String value. | None | Sets the path to the Google Service account key file. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the GCS object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the GCS object storage. |
Parent topic: Configuration for Iceberg JDBC Catalog
9.2.24.2.6.3.1 Classpath And Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
- Path to the JDBC driver to access the database used to store the Iceberg catalog.
Parent topic: Configuration for Iceberg JDBC Catalog and gs:// Scheme
9.2.24.2.6.3.2 Sample Configuration for JDBC Catalog and GCS
gs://
scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:DependencyDownloader/dependencies/iceberg-common/*:/path/to/the/jdbc/driver/* gg.eventhandler.iceberg.catalogType=jdbc gg.eventhandler.iceberg.jdbcUrl=<jdbc-url> gg.eventhandler.iceberg.jdbcUser=<jdbc-user> gg.eventhandler.iceberg.jdbcPassword=<jdbc-password> gg.eventhandler.iceberg.fileSystemScheme=gs:// gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket> gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id> gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
Parent topic: Configuration for Iceberg JDBC Catalog and gs:// Scheme
9.2.24.2.6.4 Configuration for Iceberg JDBC Catalog and abfss:// Scheme
The following are the configuration properties for the JDBC catalog and Azure Data
Lake Storage using the abfss://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
jdbc .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
File system scheme to indicate Azure Data Lake Storage
location: abfss:// .
|
gg.eventhandler.iceberg.warehouseLocation |
Required | String value. | None | Local directory path to the Iceberg warehouse. |
gg.eventhandler.iceberg.jdbcUrl |
Required | String value. | None | JDBC URL to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcUser |
Optional | String value. | None | JDBC user to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.jdbcPassword |
Optional | String value. | None | JDBC password to connect to the database used as Iceberg catalog. |
gg.eventhandler.iceberg.azureAccountName |
Required | String value. | None | Azure storage account name that contains the container for the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureContainer |
Required | String value. | None | Azure storage account container name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureAccountKey |
Required | String value. | None | Azure storage account key. |
gg.eventhandler.iceberg.azureBlobEndpoint |
Optional | String value. | <azureContainer>@<azureAccountName>.dfs.core.windows.net |
Azure Storage service endpoint. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the Azure object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the Azure object storage. |
Parent topic: Configuration for Iceberg JDBC Catalog
9.2.24.2.6.4.1 Classpath And Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop Azure SDK dependencies for writing to Azure Data Lake (ADLS)
- Path to the JDBC driver to access the database used to store the Iceberg catalog.
9.2.24.2.6.4.2 Sample Configuration for JDBC Catalog and ADLS abfss:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/*:/path/to/the/jdbc/driver/* gg.eventhandler.iceberg.catalogType=jdbc gg.eventhandler.iceberg.jdbcUrl=<jdbc-url> gg.eventhandler.iceberg.jdbcUser=<jdbc-user> gg.eventhandler.iceberg.jdbcPassword=<jdbc-password> gg.eventhandler.iceberg.fileSystemScheme=abfss:// gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name> gg.eventhandler.iceberg.azureContainer=<azure-storage-container> gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.7 Configuration for Iceberg Hadoop Catalog
Hadoop catalog is not recommended for production usage as it has no reliable locking mechanism and would impact concurrent reads and writes.
Hadoop catalog is used for testing purposes only.
- Configuration for Iceberg Hadoop Catalog and file:// Scheme
- Configuration for Iceberg Hadoop Catalog and s3a:// Scheme
- Configuration for Iceberg Hadoop Catalog and gs:// Scheme
- Configuration for Iceberg Hadoop Catalog and abfss:// Scheme
Parent topic: Configuration
9.2.24.2.7.1 Configuration for Iceberg Hadoop Catalog and file:// Scheme
The following are the configuration properties for the Hadoop catalog and the local file
system as the Iceberg storage using file://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
hadoop .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
File system scheme to indicate local file system as the
storage: file:// .
|
gg.eventhandler.iceberg.warehouseLocation |
Required | String value. | None | Local directory path to the Iceberg warehouse. |
Note:
This configuration is typically used for testing purposes for storing the Iceberg tables on the local file system.- Classpath and Dependencies
- Sample Configuration for Iceberg Hadoop Catalog and Local File Storage file:// Scheme
Parent topic: Configuration for Iceberg Hadoop Catalog
9.2.24.2.7.1.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
9.2.24.2.7.2 Configuration for Iceberg Hadoop Catalog and s3a:// Scheme
The following are the configuration properties for the Hadoop catalog and AWS S3
object store using s3a://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
hadoop .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
File system scheme to indicate AWS S3 object storage
location: s3a:// .
|
gg.eventhandler.iceberg.awsS3Bucket |
Required | String value. | None | AWS S3 bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.awsAccessKeyId |
Required | String value. | None | AWS access key id for authentication. |
gg.eventhandler.iceberg.awsSecretKey |
Required | String value. | None | AWS secret access key for authentication. |
gg.eventhandler.iceberg.awsSessionToken |
Optional | String value. | None | AWS session token for authentication. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the AWS S3 object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the AWS S3 object storage. |
Parent topic: Configuration for Iceberg Hadoop Catalog
9.2.24.2.7.2.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop AWS SDK dependencies for writing to AWS S3 (
s3a://
scheme)
9.2.24.2.7.2.2 Sample Configuration for Hadoop Catalog and AWS S3 s3a:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-aws/*:DependencyDownloader/dependencies/iceberg-common/ gg.eventhandler.iceberg.catalogType=hadoop gg.eventhandler.iceberg.fileSystemScheme=s3a:// gg.eventhandler.iceberg.awsS3Region=us-east-2 gg.eventhandler.iceberg.awsS3Bucket=<s3-bucket> gg.eventhandler.iceberg.awsAccessKeyId=<access-key-id> gg.eventhandler.iceberg.awsSecretKey=<secret-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.7.3 Configuration for Iceberg Hadoop Catalog and gs:// Scheme
The following are the configuration properties for the Hadoop catalog and GCS object
store using gs://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value. | hadoop |
hadoop .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value. | file:// |
File system scheme to indicate GCS object storage
location: gs:// .
|
gg.eventhandler.iceberg.gcpStorageBucket |
Required | String value. | None | Google Cloud Storage bucket name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.gcpProjectId |
Required | String value. | None | Sets the project-id of the Google Cloud project that houses the GCS bucket. |
gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile |
Required | String value. | None | Sets the path to the Google Service account key file. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the GCS object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the GCS object storage. |
Parent topic: Configuration for Iceberg Hadoop Catalog
9.2.24.2.7.3.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop Google Cloud Storage SDK dependencies for writing to Google Cloud Storage (GCS)
9.2.24.2.7.3.2 Sample Configuration for Hadoop Catalog and GCS gs:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-gcs/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=hadoop gg.eventhandler.iceberg.fileSystemScheme=gs:// gg.eventhandler.iceberg.gcpStorageBucket=<gcs-bucket> gg.eventhandler.iceberg.gcpProjectId=<gcp-project-id> gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile=<gcp-service-account-key-file> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.2.7.4 Configuration for Iceberg Hadoop Catalog and abfss:// Scheme
The following are the configuration properties for the Hadoop catalog and Azure Data Lake
Storage using abfss://
scheme:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.iceberg.catalogType |
Optional | String value | hadoop |
hadoop .
|
gg.eventhandler.iceberg.fileSystemScheme |
Optional | String value | file:// |
File system scheme to indicate Azure Data Lake Storage
location: abfss:// .
|
gg.eventhandler.iceberg.azureAccountName |
Required | String value | None | Azure storage account name that contains the container for the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureContainer |
Required | String value | None | Azure storage account container name that houses the Iceberg Warehouse. |
gg.eventhandler.iceberg.azureAccountKey |
Required | String value. | None | Azure storage account key. |
gg.eventhandler.iceberg.azureBlobEndpoint |
Optional | String value. | \ |
Azure Storage service endpoint. |
gg.eventhandler.iceberg.proxyServer |
Optional | String value. | None | Proxy server to connect to the Azure object storage. |
gg.eventhandler.iceberg.proxyPort |
Optional | String value. | 80 |
Proxy server port to connect to the Azure object storage. |
Parent topic: Configuration for Iceberg Hadoop Catalog
9.2.24.2.7.4.1 Classpath and Dependencies
The Java classpath (gg.classpath
) should include the following
dependencies:
- Iceberg common dependencies
- Hadoop Azure SDK dependencies for writing to Azure Data Lake (ADLS)
9.2.24.2.7.4.2 Sample Configuration for Hadoop Catalog and ADLS abfss:// Scheme
gg.target=iceberg gg.eventhandler.iceberg.warehouseLocation=/path/to/iceberg/tables gg.classpath=DependencyDownloader/dependencies/iceberg-hadoop-azure/*:DependencyDownloader/dependencies/iceberg-common/* gg.eventhandler.iceberg.catalogType=hadoop gg.eventhandler.iceberg.fileSystemScheme=abfss:// gg.eventhandler.iceberg.azureAccountName=<azure-storage-account-name> gg.eventhandler.iceberg.azureContainer=<azure-storage-container> gg.eventhandler.iceberg.azureAccountKey=<azure-storage-account-key> gg.eventhandler.iceberg.proxyServer=<proxy-server> gg.eventhandler.iceberg.proxyPort=<proxy-port>
9.2.24.3 Configuration Templates
Iceberg configuration templates are available in the directory
/path/to/AdapterExamples/bigdata/iceberg
.
The following template properties files are packaged with Oracle GoldenGate:
iceberg-glue-s3.properties
iceberg-hadoop-adls.properties
iceberg-hadoop-gcs.properties
iceberg-hadoop-localfile.properties
iceberg-hadoop-s3.properties
iceberg-jdbc-localfile.properties
iceberg-jdbc-s3.properties
iceberg-jdbc-adls.properties
iceberg-jdbc-gcs.properties
iceberg-nessie-adls.properties
iceberg-nessie-gcs.properties
iceberg-nessie-s3.properties
iceberg-nessie-s3a.properties
iceberg-polaris-adls.properties
iceberg-polaris-gcs.properties
iceberg-polaris-s3.properties
iceberg-rest.properties
Parent topic: Iceberg Event Handler
9.2.24.4 Limitations
- Oracle GoldenGate does not support configuration of partition columns during
automatic table creation.
If partitioned tables are required, the Iceberg table should be created manually with the required partition columns.
- Altering the partitioning schema of a table is not supported after starting the
Replication process.
If the partitioning schema of a table needs to be changed, the table should be dropped and recreated manually in the target database.
The data in the table will need to be reloaded.
Note:
Contact Oracle Support for assistance with this process. - Pre-existing Iceberg target tables must have identifier columns(key columns) in the
schema.
The Replicat process will ABEND if the target table does not have identifier columns.
- The following Iceberg data types cannot be used as a key column (Iceberg identifier
field):
- binary
- fixed
- uuid
Parent topic: Iceberg Event Handler
9.2.24.5 Instantiating Oracle GoldenGate with an Initial Load
For more information about the standard steps for instantiation, see: https://docs.oracle.com/en/middleware/goldengate/core/21.3/admin/instantiating-oracle-goldengate-initial-load.html#GUID-7D3BD34D-490B-4E76-A48B-63572D93881A
- Instantiation Steps Specific to Iceberg
- Iceberg Change Synchronization Replicat Behavior During Instantiation
Parent topic: Iceberg Event Handler
9.2.24.5.1 Instantiation Steps Specific to Iceberg
- Start initial load groups for Extract and Replicat.
- Start change synchronization group for Extract and write operations to a trail file.
Note:
Do not start change synchronization group for Replicat yet. - Wait until the initial load Replicat group has completed apply of the initial load trail files.
- Stop the change synchronization group for Extract.
- Configure a change synchronization Replicat group.
- Add the parameter
UPDATEINSERTS
to the change synchronization Replicat group. - Start the change synchronization Replicat group.
- Wait until the change synchronization Replicat group has processed all the trails
generated by change synchronization Extract group.
The last record’s end offset in the last trail file must match the
targetCheckpoint
value in the JSON checkpoint file of the change synchronization Replicat group.Example:- Run
ls -l
on the last trail file.-rw-r--r-- 1 username dba 5660 Feb 22 2024 /path/to/trail/tr000000003
- Here the last record’s end offset is
5660
, and the trail sequence is3
. - Open JSON checkpoint file for the change synchronization Replicat
group
This should have the following attribute:
"targetCheckpoint" : { "trailSequence" : 3, "trailOffset" : 5660 }
ThistargetCheckpoint
must match the last record’s end offset.
- Run
- Shutdown change synchronization Replicat group and remove the parameter
UPDATEINSERTS
. - Initial load is complete now. Start change synchronization Extract and Replicat groups.
Parent topic: Instantiating Oracle GoldenGate with an Initial Load
9.2.24.5.2 Iceberg Change Synchronization Replicat Behavior During Instantiation
- Execute
[DELETE+INSERT]
for all theINSERT
operations, irrespective of whether the base row exists on the target or not. - Run
[DELETE+INSERT]
for all theUPDATE
operations, irrespective of whether the base row exists on the target or not. - Run
DELETE
for all theDELETE
operations, irrespective of whether the base row exists on the target or not.Note:
No collisions will be logged in the Iceberg Replicat report file.
Parent topic: Instantiating Oracle GoldenGate with an Initial Load
9.2.24.6 Troubleshooting and Diagnostics
- Oracle GoldenGate replicat supports the Iceberg data types as per the version 2 specification.
- Iceberg identifier(key) fields cannot be null. Therefore, the Replicat process will ABEND if the key column value is null.
- Schema changes to the table such as
ADD/ALTER/DROP
columns is not supported while Replicat process is running.There are steps to quiesce the replication process, apply the schema changes and resume the replication process.
Note:
Contact Oracle Support for assistance with this process.The Replicat process will ABEND if there are unmapped columns in the target table.
- Replicat ABEND with the following
message:
ICEBERGEH-00060 Operation record at position '00000000030000003318' for the table 'hadoop.oggdb1.types_tab' has missing column values in an UPDATE. Replicat will ABEND. To override this behavior set 'gg.eventhandler.iceberg.abendOnMissingColumns=false'and restart the Replicat process. Setting this property to false will instruct Replicat to lookup missing columns from the target table and therefore may impact performance.
By default, the Iceberg Replicat process expects trails files without missing column value in the UPDATE operations. Replicat can be configured to process compressed trails files with missing column values in the UPDATE operations by setting the propertygg.eventhandler.iceberg.abendOnMissingColumns=false
. - Replicat ABEND with the following message:
ICEBERGEH-00057 Detected changes in the partition columns for the table 'hadoop.oggdb1.types_tab'. Partition columns in the previous run: '<column list>', partition columns in this run: '<column list>'. GoldenGate does not support changing partition columns. Alter the table manually to match the partition columns in the previous run and restart the replicat process.
The Iceberg Replicat process does not support changing partition columns. - Replicat ABEND with the following message:
ICEBERGEH-00067 Invalid state. The column '<column_name>' in the target table '<table_name>' is not mapped. The following are the mapped columns: '<column list>'. Iceberg Replicat requires all the columns in the target table to be mapped. Please map the column ''<unmapped column>' and restart the Replicat process.
The Iceberg Replicat process requires all the columns in the target table to be mapped. - Replicat ABEND with the following message:
ICEBERGEH-00068 Key column '<column name>' in the table '<table name>' is of type float or double. Iceberg does not support float or double type as identifier (key) fields. Initiating Replicat process shutdown. Please modify the table schema to exclude double/float types as key columns and restart the Replicat process.
As per the current Iceberg specification (version 2), the column typesdouble
andfloat
cannot be used as identifier (key) columns. - Replicat ABEND with the following
message:
ICEBERGEH-00070 Table '<table_name>' contains a key column '<column_name>' of '<binary/fixed/uuid>' type that is not supported by GoldenGate. The following column types are not supported as key: 'binary, fixed, uuid'. To proceed, either use a supported Iceberg key column type by altering the 'KEYCOLS' clause in the Replicat 'MAP' statement as per the following example: 'MAP <sourceSchema>.<sourceTable>, TARGET <targetSchema>.<targetTable>, KEYCOLS("key1", "key2");' or alter the Iceberg target tables's identifier fields to exclude the key column types that are not supported by GoldenGate. You can use the following Iceberg SQL statement to alter the table schema: 'ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS key1, key2'.
The Iceberg typesbinary
,fixed
anduuid
cannot be used as identifier (key) columns. - Replicat ABEND with the following
message:
ICEBERGEH-00071=Table '<table_name>' does not define an Iceberg identifier column. Identifier columns are used as key columns by GoldenGate. Initiating Replicat process shutdown. Please alter the Iceberg target tables's schema to add identifier columns. You can use the following Iceberg SQL statement to alter the table schema: 'ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS key1, key2'.
The Iceberg target table should have identifier columns (key columns) in the schema. - Exceptions in the Replicat handler log file:
com.google.cloud.storage.StorageException: 401 Unauthorized
org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path
org.apache.iceberg.exceptions.RuntimeIOException: Failed to create file
org.apache.iceberg.exceptions.ForbiddenException: Forbidden
These are common exceptions due to the incorrect configuration of the object storage authentication properties.
Ensure that the following properties are set:
gg.eventhandler.iceberg.fileSystemScheme
,gg.eventhandler.iceberg.proxyServer
,gg.eventhandler.iceberg.proxyPort
gg.eventhandler.iceberg.awsAccessKeyId
,gg.eventhandler.iceberg.awsSecretKey
,gg.eventhandler.iceberg.awsS3Region
gg.eventhandler.iceberg.azureAccountKey
gg.eventhandler.iceberg.gcpProjectId
,gg.eventhandler.iceberg.gcpServiceAccountJsonKeyFile
.
Parent topic: Iceberg Event Handler