10.2.3 Amazon Kinesis
The Kinesis Streams Handler streams data to applications hosted on the Amazon Cloud or in your environment.
This chapter describes how to use the Kinesis Streams Handler.
Parent topic: Target
10.2.3.1 Overview
Amazon Kinesis is a messaging system that is hosted in the Amazon Cloud. Kinesis streams can be used to stream data to other Amazon Cloud applications such as Amazon S3 and Amazon Redshift. Using the Kinesis Streams Handler, you can also stream data to applications hosted on the Amazon Cloud or at your site. Amazon Kinesis streams provides functionality similar to Apache Kafka.
The logical concepts map is as follows:
-
Kafka Topics = Kinesis Streams
-
Kafka Partitions = Kinesis Shards
A Kinesis stream must have at least one shard.
Parent topic: Amazon Kinesis
10.2.3.2 Detailed Functionality
10.2.3.2.1 Amazon Kinesis Java SDK
The Oracle GoldenGate Kinesis Streams Handler uses the AWS Kinesis Java SDK to push data to Amazon Kinesis, see Amazon Kinesis Streams Developer Guide at:
http://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-sdk.html.
The Kinesis Steams Handler was designed and tested with the latest AWS Kinesis Java SDK version 2.28.11. These are the dependencies:
-
Group ID:
software.amazon.awssdk -
Artifact ID:
kinesis -
Version:
2.28.11
Note:
It is assumed by moving to the latest AWS Kinesis Java SDK that there are no changes to the interface, which can break compatibility with the Kinesis Streams Handler.You can download the AWS Java SDK, including Kinesis from:
Parent topic: Detailed Functionality
10.2.3.2.2 Kinesis Streams Input Limits
The upper input limit for a Kinesis stream with a single shard is 1000 messages per second up to a total data size of 1MB per second. Adding streams or shards can increase the potential throughput such as the following:
-
1 stream with 2 shards = 2000 messages per second up to a total data size of 2MB per second
-
3 streams of 1 shard each = 3000 messages per second up to a total data size of 3MB per second
The scaling that you can achieve with the Kinesis Streams Handler depends on how you configure the handler. Kinesis stream names are resolved at runtime based on the configuration of the Kinesis Streams Handler.
Shards are selected by the hash the partition key. The partition key for a Kinesis message cannot be null or an empty string (""). A null or empty string partition key results in a Kinesis error that results in an abend of the Replicat process.
Maximizing throughput requires that the Kinesis Streams Handler configuration evenly distributes messages across streams and shards.
To achieve the best distribution across shards in a Kinesis stream, select a
partitioning key which rapidly changes. You can select
${primaryKeys} as it is unique per row in the source database.
Additionally, operations for the same row are sent to the same Kinesis stream and
shard. When the DEBUG logging is enabled, the Kinesis stream name,
sequence number, and the shard number are logged to the log file for successfully
sent messages.
Parent topic: Detailed Functionality
10.2.3.3 Kinesis Handler Performance Considerations
Parent topic: Amazon Kinesis
10.2.3.3.1 Kinesis Streams Input Limitations
The maximum write rate to a Kinesis stream with a single shard to be 1000 messages per second up to a maximum of 1MB of data per second. You can scale input to Kinesis by adding additional Kinesis streams or adding shards to streams. Both adding streams and adding shards can linearly increase the Kinesis input capacity and thereby improve performance of the Oracle GoldenGate Kinesis Streams Handler.
Adding streams or shards can linearly increase the potential throughput such as follows:
-
1 stream with 2 shards = 2000 messages per second up to a total data size of 2MB per second.
-
3 streams of 1 shard each = 3000 messages per second up to a total data size of 3MB per second.
To fully take advantage of streams and shards, you must configure the Oracle GoldenGate Kinesis Streams Handler to distribute messages as evenly as possible across streams and shards.
Adding additional Kinesis streams or shards does nothing to scale Kinesis input if all data is sent to using a static partition key into a single Kinesis stream. Kinesis streams are resolved at runtime using the selected mapping methodology. For example, mapping the source table name as the Kinesis stream name may provide good distribution of messages across Kinesis streams if operations from the source trail file are evenly distributed across tables. Shards are selected by a hash of the partition key. Partition keys are resolved at runtime using the selected mapping methodology. Therefore, it is best to choose a mapping methodology to a partition key that rapidly changes to ensure a good distribution of messages across shards.
Parent topic: Kinesis Handler Performance Considerations
10.2.3.3.2 Transaction Batching
The Oracle GoldenGate Kinesis Streams Handler receives messages and then batches together messages by Kinesis stream before sending them via synchronous HTTPS calls to Kinesis. At transaction commit all outstanding messages are flushed to Kinesis. The flush call to Kinesis impacts performance. Therefore, deferring the flush call can dramatically improve performance.
The recommended way to defer the flush call is to use the GROUPTRANSOPS configuration in the replicat configuration. The GROUPTRANSOPS groups multiple small transactions into a single larger transaction deferring the transaction commit call until the larger transaction is completed. The GROUPTRANSOPS parameter works by counting the database operations (inserts, updates, and deletes) and only commits the transaction group when the number of operations equals or exceeds the GROUPTRANSOPS configuration setting. The default GROUPTRANSOPS setting for replicat is 1000.
Interim flushes to Kinesis may be required with the GROUPTRANSOPS setting set to a large amount. An individual call to send batch messages for a Kinesis stream cannot exceed 500 individual messages or 5MB. If the count of pending messages exceeds 500 messages or 5MB on a per stream basis then the Kinesis Handler is required to perform an interim flush.
Parent topic: Kinesis Handler Performance Considerations
10.2.3.3.3 Deferring Flush at Transaction Commit
The messages are by default flushed to Kinesis at transaction commit to ensure write durability. However, it is possible to defer the flush beyond transaction commit. This is only advisable when messages are being grouped and sent to Kinesis at the transaction level (that is one transaction = one Kinesis message or chunked into a small number of Kinesis messages), when the user is trying to capture the transaction as a single messaging unit.
This may require setting the GROUPTRANSOPS replication parameter to 1 so as not to group multiple smaller transactions from the source trail file into a larger output transaction. This can impact performance as only one or few messages are sent per transaction and then the transaction commit call is invoked which in turn triggers the flush call to Kinesis.
In order to maintain good performance the Oracle GoldenGate Kinesis Streams Handler allows the user to defer the Kinesis flush call beyond the transaction commit call. The Oracle GoldenGate replicat process maintains the checkpoint in the .cpr file in the {GoldenGate Home}/dirchk directory. The Java Adapter also maintains a checkpoint file in this directory named .cpj. The Replicat checkpoint is moved beyond the checkpoint for which the Oracle GoldenGate Kinesis Handler can guarantee message loss will not occur. However, in this mode of operation the GoldenGate Kinesis Streams Handler maintains the correct checkpoint in the .cpj file. Running in this mode will not result in message loss even with a crash as on restart the checkpoint in the .cpj file is parsed if it is before the checkpoint in the .cpr file.
Parent topic: Kinesis Handler Performance Considerations
10.2.3.4 Troubleshooting
10.2.3.4.1 Java Classpath
The most common initial error is an incorrect classpath to include all the required AWS Kinesis Java SDK client libraries and creates a ClassNotFound exception in the log file.
You can troubleshoot by setting the Java Adapter logging to DEBUG, and then rerun the process. At the debug level, the logging includes information about which JARs were added to the classpath from the gg.classpath configuration variable.
The gg.classpath variable supports the wildcard asterisk (*) character to select all JARs in a configured directory. For example, /usr/kinesis/sdk/*, see Setting Up and Running the Kinesis Streams Handler.
Parent topic: Troubleshooting
10.2.3.4.2 Kinesis Handler Connectivity Issues
If the Kinesis Streams Handler is unable to connect to Kinesis when running on premise, the problem can be the connectivity to the public Internet is protected by a proxy server. Proxy servers act a gateway between the private network of a company and the public Internet. Contact your network administrator to get the URLs of your proxy server, and then follow the directions in Configuring the Proxy Server for Kinesis Streams Handler.
Parent topic: Troubleshooting
10.2.3.4.3 Logging
The Kinesis Streams Handler logs the state of its configuration to the Java log file.
This is helpful because you can review the configuration values for the handler. Following is a sample of the logging of the state of the configuration:
**** Begin Kinesis Streams Handler - Configuration Summary ****
Mode of operation is set to op.
The AWS region name is set to [us-west-2].
A proxy server has been set to [www-proxy.us.oracle.com] using port [80].
The Kinesis Streams Handler will flush to Kinesis at transaction commit.
Messages from the GoldenGate source trail file will be sent at the operation level.
One operation = One Kinesis Message
The stream mapping template of [${fullyQualifiedTableName}] resolves to [fully qualified table name].
The partition mapping template of [${primaryKeys}] resolves to [primary keys].
**** End Kinesis Streams Handler - Configuration Summary ****Parent topic: Troubleshooting