1 Overview

1.1 Understanding Oracle GoldenGate for Distributed Applications and Analytics

This section describes the concepts and basic structure of the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).

Watch this video for an introduction to Oracle GoldenGate Microservices: Introduction to GoldenGate Microservices

1.1.1 Understanding Oracle GoldenGate for Distributed Applications and Analytics

The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) integrates with Oracle GoldenGate instances.

The Oracle GoldenGate product enables you to:

  • Capture transactional changes from a source database.
  • Sends and queues these changes as a set of database-independent files called the Oracle GoldenGate trail.
  • Optionally alters the source data using mapping parameters and functions.
  • Applies the transactions in the trail to a target system database.

Oracle GoldenGate performs this capture and apply in near real-time across heterogeneous databases, platforms, and operating systems.

1.1.1.1 Delivery Configuration Options

The Java delivery module is loaded by the GoldenGate Replicat process, which is configured using the Replicat parameter file. Upon loading, the Java Delivery module is subsequently configured based on the configuration present in the Adapter Properties file. Application behavior can be customized by:

  • Editing the property files; for example to:

    • Set target types, host names, port numbers, output file names, JMS connection settings;

    • Turn on/off debug-level logging, and so on.

    • Identify which message format should be used.

  • Records can be custom formatted by:

    • Setting properties for the pre-existing format process (for fixed-length or field-delimited message formats, XML, JSON, or Avro formats);

    • Customizing message templates, using the Velocity template macro language;

    • (Optional) Writing custom Java code.

  • (Optional) Writing custom Java code to provide custom handling of transactions and operations, do filtering, or implementing custom message formats.

There are existing implementations (handlers) for sending messages using JMS and for writing out files to disk. For Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) targets, there are built in integration handlers to write to supported databases.

There are several predefined message formats for sending the messages (for example, XML or field-delimited); or custom formats can be implemented using templates. Each handler has documentation that describes its configuration properties; for example, a file name can be specified for a file writer, and a JMS queue name can be specified for the JMS handler. Some properties apply to more than one handler; for example, the same message format can be used for JMS and files.

1.1.1.2 Adapter Integration Options

There are two major products which are based on the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) architecture:

  • The Oracle GoldenGate Java Adapter is the overall framework. This product allows you to implement custom code to handle Oracle GoldenGate trail records according to their specific requirements. It comes built-in with Oracle GoldenGate File Writer module that can be used for flat file integration purposes.

  • GG for DAA. The GG for DAA product contains built-in support to write operation data from Oracle GoldenGate trail records into various GG for DAA targets (such as, HDFS, HBase, Kafka, Flume, JDBC, Cassandra, and MongoDB). You do not need to write custom code to integrate with GG for DAA applications. The functionality is separated into handlers that integrate with third party applications and formatters, which transform the data into various formats, such as Avro, JSON, delimited text, and XML. In certain instances, the integration to a third-party tool is proprietary, like the HBase API. In these instances, the formatter exists without an associated handler.

The Oracle GoldenGate Java Adapter and the GG for DAA products have some crossover in functionality so the handler exists without an associated formatter. The following list details the major areas of functionality and in which product or products the functionality is included:

  • Read JMS messages and deliver them as an Oracle GoldenGate trail. This feature is included in GG for DAA.

  • Read an Oracle GoldenGate trail and deliver transactions to a JMS provider or other messaging system or custom application. This feature is included in GG for DAA products.

  • Read an Oracle GoldenGate trail and write transactions to a file that can be used by other applications. This feature is only included in GG for DAA.

  • Read an Oracle GoldenGate trail and write transactions to a GG for DAA targets. The GG for DAA integration features are only included in GG for DAA product.

1.1.1.2.1 Capturing Transactions to a Trail

Oracle GoldenGate message capture can be used to read messages from a queue and communicate with an Oracle GoldenGate Extract process to generate a trail containing the processed data.

The message capture processing is implemented as a Vendor Access Module (VAM) plug-in to a generic Extract process. A set of properties, rules and external files provide messaging connectivity information and define how messages are parsed and mapped to records in the target Oracle GoldenGate trail.

Currently this adapter supports capturing JMS text messages.

1.1.1.2.2 Applying Transactions from a Trail

Oracle GoldenGate Java Adapter delivery can be used to apply transactional changes to targets other than a relational database: for example, ETL tools (DataStage, Ab Initio, Informatica), JMS messaging, Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) Applications, or custom APIs. There are a variety of options for integration with Oracle GoldenGate:

  • Flat file integration: predominantly for ETL, proprietary or legacy applications, Oracle GoldenGate File Writer can write micro batches to disk to be consumed by tools that expect batch file input. The data is formatted to the specifications of the target application such as delimiter separated values, length delimited values, or binary. Near real-time feeds to these systems are accomplished by decreasing the time window for batch file rollover to minutes or even seconds.

  • Messaging: transactions or operations can be published as messages (for example, in XML) to JMS. The JMS provider is configurable to work with multiple JMS implementation; examples include ActiveMQ, JBoss Messaging, TIBCO, Oracle WebLogic JMS, WebSphere MQ, and others.

  • Java API: custom handlers can be written in Java to process the transaction, operation and metadata changes captured by Oracle GoldenGate on the source system. These custom Java handlers can apply these changes to a third-party Java API exposed by the target system.

  • GG for DAA integration: writing transaction data from the source trail files into various GG for DAA targets can be achieved by means of setting configuration properties. The GG for DAA product contains built in GG for DAA handlers to write to HDFS, HBase, Kafka, and Flume targets.

All four options have been implemented as extensions to the core Oracle GoldenGate product.

  • For Java integration using either JMS or the Java API, use Oracle GoldenGate for Java.

  • For GG for DAA integration, you can configure Oracle GoldenGate Replicat to integrate with the GG for DAA module. Writing to GG for DAA targets in various formats can be configured using a set of properties with no programming required.

1.1.1.3 Monitoring Performance

For more information about monitoring the performance, see Monitor Performance from the Performance Metrics Service in Using Oracle GoldenGate Microservices Architecture.

1.2 What’s Supported in Oracle GoldenGate for Distributed Applications and Analytics

1.2.1 Verifying Certification and System Requirements

Oracle recommends that you use the certification matrix and system requirements documents with each other to verify that your environment meets the requirements for installation.

  1. Verifying that your environment meets certification requirements:

    Make sure that you install your product on a supported hardware and software configuration. See the certification matrix for more details: GoldenGate Certifications.

    Oracle has tested and verified the performance of your product on all certified systems and environments. Whenever new certifications are released, they are added to the certification document right away. New certifications can be released at any time. Therefore, the certification documents are kept outside the documentation libraries and are available on Oracle Technology Network.

  2. Using the system requirements document to verify certification:

    Oracle recommends that you use the Oracle Fusion Middleware Supported System Configuration document to verify that the certification requirements are met. For example, if the certification document indicates that your product is certified for installation on 64-Bit Oracle Linux 6.5, use this document to verify that your system meets the required minimum specifications. These include disk space, available memory, specific platform packages and patches, and other operating system-specific requirements. System requirements can change in the future. Therefore, the system requirement documents are kept outside of the documentation libraries and are available on Oracle Technology Network.

  3. Verifying interoperability among multiple products:

    To learn how to install and run multiple Fusion Middleware products from the same release or mixed releases with each other, see Oracle Fusion Middleware Supported System Configuration in Oracle Fusion Middleware Understanding Interoperability and Compatibility.

The compatibility of the GG for DAA handlers with the various data collections, including distributions, database releases, and drivers is included in the certification document.

1.2.2 Understanding Handler Compatibility

For more information, see the Certification Matrix.

1.2.3 What are the Additional Support Considerations?

This section describes additional Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) additional support considerations.

Pluggable Formatters—Support

The handlers support the Pluggable Formatters as follows:

  • The File Writer Handler supports all of the pluggable formatters.
  • The HDFS Handler supports all of the pluggable formatters.
  • Pluggable formatters are not applicable to the HBase Handler. Data is streamed to HBase using the proprietary HBase client interface.

  • The Kafka Handler supports all of the pluggable formatters.

  • The Kafka Connect Handler does not support pluggable formatters. You can convert data to JSON or Avro using Kafka Connect data converters.

  • The Kinesis Streams Handler supports all of the pluggable formatters described in the Using the Pluggable Formatters.

  • The Cassandra, MongoDB, and JDBC Handlers do not use a pluggable formatter.

Java Delivery Using Extract

Java Delivery using Extract is not supported. Support for Java Delivery is only supported using the Replicat process. Replicat provides better performance, better support for checkpointing, and better control of transaction grouping.

MongoDB Handler—Support
  • The handler can only replicate unique rows from source table. If a source table has no primary key defined and has duplicate rows, replicating the duplicate rows to the MongoDB target results in a duplicate key error and the Replicat process abends.

  • Missed updates and deletes are undetected so are ignored.

  • Untested with sharded collections.

  • Only supports date and time data types with millisecond precision. These values from a trail with microseconds or nanoseconds precision are truncated to millisecond precision.

  • The datetime data type with timezone in the trail is not supported.

  • A maximum BSON document size of 16 MB. If the trail record size exceeds this limit, the handler cannot replicate the record.

  • No DDL propagation.

  • No truncate operation.

JDBC Handler—Support
  • The JDBC handler uses the generic JDBC API, which means any target database with a JDBC driver implementation should be able to use this handler. There are a myriad of different databases that support the JDBC API and Oracle cannot certify the JDBC Handler for all targets.

  • The handler supports Replicat using the REPERROR and HANDLECOLLISIONS parameters.

  • DDL operations are ignored by default and are logged with a WARN level.

  • Coordinated Replicat is a multithreaded process that applies transactions in parallel instead of serially. Each thread handles all of the filtering, mapping, conversion, SQL construction, and error handling for its assigned workload. A coordinator thread coordinates transactions across threads to account for dependencies. It ensures that DML is applied in a synchronized manner preventing certain DMLs from occurring on the same object at the same time due to row locking, block locking, or table locking issues based on database specific rules.  If there are database locking issue, then Coordinated Replicat performance can be extremely slow or pauses.

DDL Event Handling

Only the TRUNCATE TABLE DDL statement is supported. All other DDL statements, suh as CREATE TABLE, CREATE INDEX, and DROP TABLE are ignored.

You can use the TRUNCATE statements one of these ways:

  • In a DDL statement, TRUNCATE TABLE, ALTER TABLE TRUNCATE PARTITION, and other DDL TRUNCATE statements. This uses the DDL parameter.

  • Standalone TRUNCATE support, which just has TRUNCATE TABLE. This uses the GETTRUNCATES parameter.

1.3 Dependency Downloader

Utility scripts are located in the {GGforDAA install}/DependencyDownloader directory to download client dependency jars for the various supported Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) integrations.

Topics:

1.3.1 Dependency Downloader Setup

To complete the Dependency Downloader setup:
  1. To verify that Java is installed, execute the following from the command line: java -version.

    Note:

    The Dependency Downloader utility scripts require Java to run. Ensure that Oracle Java is downloaded and is available in the PATH on the machine where the scripts are installed.
  2. Configure the proxy settings in the following script: {GGforDAA install}/DependencyDownloader/config_proxy.sh. Following are the 2 entries in this file:
    • #export PROXY_SERVER_HOST=www-proxy-hqdc.us.oracle.com
    • #export PROXY_SERVER_PORT=80
    To configure the proxy settings:
    1. Uncomment the configuration settings. (remove the # at beginning of the lines).
    2. Change the host name and port number to your correct proxy server settings.

    Note:

    Most companies maintain a private network which in turn has a network firewall to shield it from the public Internet. Additionally, most companies maintain a forwarding proxy server which serves as a gateway between the customer’s private network and the public Internet. The Dependency Downloader utilities must access Maven repositories, which are available on the Internet. Therefore, you need to supply configuration for HTTP proxy settings in order to download dependency libraries. Proxy servers are identified by host name and port. If you do not know whether your company employs a proxy server or the settings, then contact your IT or network administrators.

The Dependency Downloader uses Bash scripts in order to invoke Maven and download dependencies. The Bash shell is not supported natively from the Windows Command Prompt. You can run the Dependency Downloader scripts on Windows, but it requires the installation of a Unix emulator. A Unix emulator provides a Unix style command line on Windows and supports various flavors of the Unix shells including Bash. An option for Unix emulators is Cygwin, which is available free of charge. After Cygwin is installed, the setup process is the same. Setup and running of the scripts should be done through the Cygwin64 Terminal. See https://www.cygwin.com/.

1.3.2 Running the Dependency Downloader Scripts

To run the dependency downloader scripts:
  1. Use a Unix terminal interface navigate to the following directory: {GGforDAA install}/DependencyDownloader.
  2. Execute the following to run the scripts: ./{the dependency script} {version of the dependencies to download}

    For example: ./aws.sh 1.11.893

    Dependency libraries get downloaded to the following directory:

    {GGforDAA install}/DependencyDownloader/dependencies/{the dependency name}_{the_dependency_version}.

    For example: {GGforDAA install}/DependencyDownloader/dependencies/aws_sdk_1.11.893.

Ensure that the version string exactly matches the version string of the dependency which is being downloaded. If a dependency version doesn't exist in the public Maven repository,then it is not possible to download the dependency and running the script results in an error. Most public Maven repositories support a web-based GUI whereby you can browse the supported versions of various dependencies. The exception is the Confluent Maven repository does not support a web-based GUI. This makes downloading dependencies challenging, because the version string is not independently verifiable through a web interface.

After the dependencies are successfully downloaded, you must configure the gg.classpath variable in the Java Adapter properties file to include the dependencies for the corresponding replicat process.

Note:

Best Practices
  1. Whenever possible, use the exact version of the client libraries to the server/application integration to which you are connecting.
  2. Prior to running the Dependency Downloader scripts, independently verify that the version string exists in the repository through the web GUI.

1.3.3 Dependency Downloader Scripts

Table 1-1 Relevant Handlers/Capture

Client Script Description Relevant Handlers/Capture Versions Supported Dependency Link

Amazon Web Services SDK

aws.sh This script downloads the Amazon Web Services (AWS) SDK, which provides client libraries for connectivity to the AWS cloud. Kinesis Handler

S3 Event Handler

1.12.x https://search.maven.org/artifact/com.amazonaws/aws-java-sdk
Google BigQuery bigquery.sh This script downloads the required client libraries for Google BigQuery. BigQuery Handler 2.x https://search.maven.org/artifact/com.google.cloud/google-cloud-bigquery
Cassandra DSE (Datastax Enterprise) Client cassandra_dse.sh This script downloads the Cassandra DSE client. Cassandra DSE is the for-purchase version of Cassandra available from Datastax. Cassandra Handler 2.0.0 and higher https://search.maven.org/artifact/com.datastax.dse/dse-java-driver-core
Apache Cassandra Client cassandra.sh This script downloads the Apache Cassandra client. Cassandra Handler 4.0.0 and higher https://search.maven.org/artifact/com.datastax.oss/java-driver-core
Cassandra Capture 3x Client cassandra_capture_3x.sh This script downloads all the client libraries needed for Capture from Cassandra 3.x versions. Cassandra Capture 3x 3.3.1 (used by default) https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core/3.3.1
Cassandra Capture 4x Client cassandra_capture_4x.sh This script downloads all the client libraries needed for Capture from Cassandra 4.x versions. Cassandra Capture 4x 4.14.1 (used by default) https://mvnrepository.com/artifact/com.datastax.oss/java-driver-core/4.14.1
Cassandra Capture DSE Client cassandra_capture_dse.sh This script downloads all the client libraries needed for Capture from DSE Cassandra 6.x versions. Cassandra Capture DSE 4.14.1 (used by default) https://mvnrepository.com/artifact/com.datastax.oss/java-driver-core/4.14.1
Elasticsearch Java Client elasticsearch_java.sh This script downloads the Elasticsearch Java Client. Elasticsearch Handler 7.x and 8.x https://search.maven.org/artifact/co.elastic.clients/elasticsearch-java
Hadoop Azure Client from Cloudera hadoop_azure_cloudera.sh This script downloads the Hadoop Azure client libraries provided by Cloudera. The Hadoop Azure client libraries cannot be loaded along with the Hadoop client because in Cloudera, the version numbers between the two components do not line up perfectly.
  • HDFS Handler
  • HDFS Event Handler
  • ORC Event Handler
  • Parquet Event Handler
3.x https://repository.cloudera.com/service/rest/repository/browse/cloudera-repos/org/apache/hadoop/hadoop-azure/
Hadoop Client from Cloudera hadoop_cloudera.sh This script downloads the Hadoop client libraries provided by Cloudera.
  • HDFS Handler
  • HDFS Event Handler
  • ORC Event Handler
  • Parquet Event Handler
3.x https://repository.cloudera.com/service/rest/repository/browse/cloudera-repos/org/apache/hadoop/hadoop-client/
Hadoop Client from Hortonworks hadoop_hortonworks.sh The Hadoop client including the libraries for connectivity to Azure Data Lake available from Hortonworks.
  • HDFS Handler
  • HDFS Event Handler
  • ORC Event Handler
  • Parquet Event Handler
3.x https://repo.hortonworks.com/service/rest/repository/browse/public/org/apache/hadoop/hadoop-client/
Apache Hadoop Client Plus Required Libraries for Azure Connectivity hadoop.sh The Hadoop client including the libraries for connectivity to Azure Data Lake.
  • HDFS Handler
  • HDFS Event Handler
  • ORC Event Handler
  • Parquet Event Handler
3.x https://search.maven.org/artifact/org.apache.hadoop/hadoop-azure
HBase Client Provided by Cloudera hbase_cloudera.sh The HBase client libraries provided by Cloudera. HBase Handler 2.x https://repository.cloudera.com/service/rest/repository/browse/cloudera-repos/org/apache/hbase/hbase-client/
HBase Client Provided by Hortonworks hbase_hortonworks.sh The HBase client libraries provided by Hortonworks. HBase Handler 2.x https://repo.hortonworks.com/service/rest/repository/browse/public/org/apache/hbase/hbase-client/
Apache HBase Client hbase.sh

The HBase client.

HBase Handler 2.x https://search.maven.org/artifact/org.apache.hbase/hbase-client
Apache Kafka Client plus Kafka Connect Framework and JSON Converter from Cloudera kafka_cloudera.sh The Kafka Client plus libraries for the Kafka Connect framework and the Kafka Connect JSON Converter provided by Cloudera.
  • Kafka Handler
  • Kafka Connect Handler
  • Kafka Capture
0.9.x to current https://repository.cloudera.com/service/rest/repository/browse/cloudera-repos/org/apache/kafka/kafka-clients/
Apache Kafka Client plus Kafka Connect Framework and JSON Converter from Hortonworks kafka_hortonworks.sh The Kafka Client plus libraries for the Kafka Connect framework and the Kafka Connect JSON Converter provided by Hortonworks.
  • Kafka Handler
  • Kafka Connect Handler
  • Kafka Capture
0.9.x to current https://repo.hortonworks.com/service/rest/repository/browse/public/org/apache/kafka/kafka-clients/
Apache Kafka Client plus Kafka Connect Framework and JSON Converter kafka.sh The Kafka Client plus libraries for the Kafka Connect framework and the Kafka Connect JSON Converter.
  • Kafka Handler
  • Kafka Connect Handler
  • Kafka Capture
0.9.x to current https://search.maven.org/artifact/org.apache.kafka/kafka-clients
Confluent Kafka Client plus Kafka Connect Framework and JSON and Avro Converters kafka_confluent.sh The Kafka Client plus libraries for the Kafka Connect framework and the Kafka Connect JSON Converter and the Kafka Connect Avro Converter available from Confluent.
  • Kafka Handler
  • Kafka Connect Handler
  • Kafka Capture
Confluent platform 4.1.0 and higher. See https://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/
MapR Kafka Client kafka_mapr.sh The MapR Kafka Client libraries. Kafka Handler 0.x, 1.x, and 2.x https://repository.mapr.com/nexus/content/groups/mapr-public/org/apache/kafka/kafka-clients/
Confluent Kafka Client plus Kafka Connect Framework and Protobuf Converter kafka_confluent_protobuf.sh The Kafka Client plus libraries for the Kafka Connect framework and the Kafka Connect Protobuf converter available from Confluent.
  • Kafka Handler
  • Kafka Connect Handler
Confluent 5.x and higher See https://packages.confluent.io/maven/io/confluent/kafka-connect-protobuf-converter/
MongoDB Client mongodb.sh The MongoDB client libraries. MongoDB Handler 5.x https://mvnrepository.com/artifact/org.mongodb/mongodb-driver-legacy
Oracle NoSQL SDK Client oracle_nosql_sdk.sh The Oracle NoSQL client libraries. Oracle NoSQL Handler 5.x https://search.maven.org/artifact/com.oracle.nosql.sdk/nosqldriver
Oracle OCI Client oracle_oci.sh The Oracle OCI client libraries. Oracle OCI Event Handler 3.x https://search.maven.org/artifact/com.oracle.oci.sdk/oci-java-sdk-objectstorage
Apache ORC (Optimized Row Columnar) Client orc.sh The Apache ORC client libraries. ORC is built on top of the Hadoop client so the ORC Event Handler needs the Hadoop client in order to run. The Hadoop client needs to be downloaded separately. ORC Event Handler 1.x https://search.maven.org/artifact/org.apache.orc/orc-core
Apache Parquet Client parquet.sh The Apache Parquet client libraries. Parquet is built on top of the Hadoop client, so the Parquet Event Handler needs the Hadoop client in order to run. The Hadoop client needs to be downloaded separately. Parquet Event Handler 1.x https://search.maven.org/artifact/org.apache.parquet/parquet-hadoop
Apache Velocity velocity.sh The Velocity libraries were removed from the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) installation starting from the 21.1 release. This script downloads the libraries required for formatting using Velocity. Velocity Formatter 1.x https://search.maven.org/artifact/org.apache.velocity/velocity
Google Cloud Storage Java SDK gcs.sh This script downloads the required client libraries for Google Cloud Storage. GCS Event Handler 2.x https://search.maven.org/artifact/com.google.cloud/google-cloud-storage
MongoDB Capture mongodb_capture.sh This script downloads the required client libraries for MongoDB capture. MongoDB Capture 5.x https://search.maven.org/artifact/org.mongodb/mongodb-driver-reactivestreams
Synapse JDBC Driver synapse.sh This script downloads the Synapse JDBC driver. Additionally, the Hadoop client is also required to stage data to Azure Data Lake. Synapse Stage and Merge 12.6.1jre8 https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc/12.6.1.jre8
Snowflake JDBC Driver snowflake.sh This script downloads the Snowflake JDBC driver. Other client libraries are likely required for staging the data to AWS or Azure cloud. Snowflake Stage and Merge 3.15.1 https://search.maven.org/artifact/net.snowflake/snowflake-jdbc/3.15.1/jar
Jedis client for Redis redis.sh This script downloads Jedis which is a Redis client. Redis Handler 4.x https://search.maven.org/artifact/redis.clients/jedis
Google Pub/Sub Client googlepubsub.sh This script downloads the Java client for Google Pub/Sub Messaging. Google Pub/Sub Handler 1.x https://search.maven.org/artifact/com.google.cloud/google-cloud-pubsub
Databricks JDBC Driver databricks.sh This script downloads the Databricks JDBC driver. Databricks Stage and Merge 2.6.36 https://mvnrepository.com/artifact/com.databricks/databricks-jdbc/2.6.36
Azure Blob Storage Client azure_blob_storage.sh This script downloads the Microsoft Azure Blob Storage Client. Azure Blob Storage Event Handler

Data Warehouse Stage and Merge implementations can use this as well to upload to Azure Data Lake.
12.25.3 https://search.maven.org/artifact/com.azure/azure-storage-blob
Snowflake Streaming snowflakestreaming.sh This script can be downloaded using the Dependency Downloader script. NA NA The script can be found in following location :<OGGDIR>/DependencyDownloader/snowflakestreaming.sh