9.1.3 Apache Cassandra
The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) capture (Extract) for Cassandra Extract is used to get changes from Apache Cassandra databases.
You can select Extract starting positions from UI from Extract Options, under Begin. You can either select Now or define a Custom Time.
- Overview
- Setting Up Cassandra Extract Change Data Capture
- Deduplication
- Topology Changes
- Data Availability in the CDC Logs
- Using Initial Load Extract
- Using Change Data Capture Extract
- Replicating to RDMBS Targets
- Partition Update or Insert of Static Columns
- Partition Delete
- Security and Authentication
- Cleanup of CDC Commit Log Files
You can use the Cassandra CDC commit log purger program to purge the CDC commit log files that are not in use. - Multiple Extract Support
- CDC Configuration Reference
- Troubleshooting
- Cassandra Capture Client Dependencies
What are the dependencies for the Cassandra Capture (Extract) to connect to Apache Cassandra databases?
Parent topic: Source
9.1.3.1 Overview
Apache Cassandra is a NoSQL Database Management System designed to store large amounts of data. A Cassandra cluster configuration provides horizontal scaling and replication of data across multiple machines. It can provide high availability and eliminate a single point of failure by replicating data to multiple nodes within a Cassandra cluster. Apache Cassandra is open source and designed to run on low-cost commodity hardware.
Cassandra relaxes the axioms of a traditional relational database management systems (RDBMS) regarding atomicity, consistency, isolation, and durability. When considering implementing Cassandra, it is important to understand its differences from a traditional RDBMS and how those differences affect your specific use case.
Cassandra provides eventual consistency. Under the eventual consistency model, accessing the state of data for a specific row eventually returns the latest state of the data for that row as defined by the most recent change. However, there may be a latency period between the creation and modification of the state of a row and what is returned when the state of that row is queried. The benefit of eventual consistency is that the latency period is predicted based on your Cassandra configuration and the level of work load that your Cassandra cluster is currently under, see http://cassandra.apache.org/.
Review the data type support, see About the Cassandra Data Types.
Parent topic: Apache Cassandra
9.1.3.2 Setting Up Cassandra Extract Change Data Capture
Prerequisites
-
Apache Cassandra cluster must have at least one node up and running.
-
Read and write access to CDC commit log files on every live node in the cluster is done through SFTP or NFS. For more information, see Setup SSH Connection to the Cassandra Nodes.
-
Every node in the Cassandra cluster must have the
cdc_enabled
parameter set totrue
in thecassandra.yaml
configuration file. -
Virtual nodes must be enabled on every Cassandra node by setting the
num_tokens
parameter incassandra.yaml
. - You must download the third party libraries using Dependency downloader scripts. For more information, see Cassandra Capture Client Dependencies.
-
New tables can be created with Change Data Capture (CDC) enabled using the
WITH CDC=true
clause in theCREATE TABLE
command. For example:CREATE TABLE ks_demo_rep1.mytable (col1 int, col2 text, col3 text, col4 text, PRIMARY KEY (col1)) WITH cdc=true;
You can enable CDC on existing tables as follows:
ALTER TABLE ks_demo_rep1.mytable WITH cdc=true;
- Setup SSH Connection to the Cassandra Nodes
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) transfers Cassandra commit log files from all the Cassandra nodes. To allow Oracle GoldenGate to transfer commit log files using secure shell protocol ( SFTP), generate aknown_hosts
SSH file. - Data Types
- Cassandra Database Operations
- Set up Credential Store Entry to Detect Source Type
Parent topic: Apache Cassandra
9.1.3.2.1 Setup SSH Connection to the Cassandra Nodes
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA)
transfers Cassandra commit log files from all the Cassandra nodes. To allow Oracle
GoldenGate to transfer commit log files using secure shell protocol ( SFTP), generate a
known_hosts
SSH file.
known_hosts
SSH file:
Parent topic: Setting Up Cassandra Extract Change Data Capture
9.1.3.2.2 Data Types
Supported Cassandra Source Data Types
The following are the supported source data types:
-
ASCII
-
BIGINT
-
BLOB
-
BOOLEAN
-
DATE
-
DECIMAL
-
DOUBLE
-
DURATION
-
FLOAT
-
INET
-
INT
-
SMALLINT
-
TEXT
-
TIME
-
TIMESTAMP
-
TIMEUUID
-
TINYINT
-
UUID
-
VARCHAR
-
VARINT
Unsupported Source Data Types
The following are the unsupported source data types:
-
COUNTER
-
MAP
-
SET
-
LIST
-
UDT
(user defined type) -
TUPLE
-
CUSTOM_TYPE
Parent topic: Setting Up Cassandra Extract Change Data Capture
9.1.3.2.3 Cassandra Database Operations
Supported Operations
The following are the supported operations:
-
INSERT
-
UPDATE
(Captured asINSERT
) -
DELETE
Unsupported Operations
The TRUNCATE
DDL
(CREATE
, ALTER
, and DROP
) operation is not supported. Because the Cassandra commit log files do not record any before images for the UPDATE
or DELETE
operations. The result is that the captured operations can never have a before image. Oracle GoldenGate features that rely on before image records, such as Conflict Detection and Resolution, are not available.
Parent topic: Setting Up Cassandra Extract Change Data Capture
9.1.3.2.4 Set up Credential Store Entry to Detect Source Type
userid
. The generic format for
userid
is as follows:
<dbtype>://<db-user>@<comma
separated list of server
addresses>:<port>
The
userid
can have multiple server/nodes
addresses.
userid
.
In the Administration Service, you can create the credential store entry under DB Connections. To add Trandata, go to DB Connections in Administration Service, connect to your database from credential entry and Add Trandata.
alter credentialstore add user cassandra://db-user@127.0.0.1,127.0.0.2:9042 password db-passwd alias cass
Parent topic: Setting Up Cassandra Extract Change Data Capture
9.1.3.3 Deduplication
One of the features of a Cassandra cluster is its high availability. To support high availability, multiple redundant copies of table data are stored on different nodes in the cluster. Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) Cassandra Capture automatically filters out duplicate rows (deduplicate). Deduplication is active by default. Oracle recommends using it if your data is captured and applied to targets where duplicate records are discouraged (for example RDBMS targets).
Parent topic: Apache Cassandra
9.1.3.4 Topology Changes
Cassandra nodes can change their status (topology change) and the cluster can still be alive. GG for DAA Cassandra Capture can detect the node status changes and react to these changes when applicable. The Cassandra capture process can detect the following events happening in the cluster:
-
Node shutdown and boot.
-
Node decommission and commission.
-
New keyspace and table created.
Due to topology changes, if the capture process detects that an active producer node goes down, it tries to recover any missing rows from an available replica node. During this process, there is a possibility of data duplication for some rows. This is a transient data duplication due to the topology change. For more details about reacting to changes in topology, see Troubleshooting.
Parent topic: Apache Cassandra
9.1.3.5 Data Availability in the CDC Logs
The Cassandra CDC API can only read data from commit log files in the CDC directory. There is a latency for the data in the active commit log directory to be archived (moved) to the CDC commit log directory.
The input data source for the Cassandra capture process is the CDC commit log directory. There could be delays for the data to be captured mainly due to the commit log files not yet visible to the capture process.
On a production cluster with a lot of activity, this latency is very minimal as the data is archived from the active commit log directory to the CDC commit log directory in the order of microseconds.
Parent topic: Apache Cassandra
9.1.3.6 Using Initial Load Extract
Cassandra Extract supports the standard initial load capability to extract source table data to GG for DAA trail files.
Initial load for Cassandra can be performed to synchronize tables, either as a prerequisite step to replicating changes or as a standalone function.
Direct loading from a source Cassandra table to any target table is not supported.
Parent topic: Apache Cassandra
9.1.3.7 Using Change Data Capture Extract
Review the example .prm
files from GG for DAA installation
directory under
$HOME/AdapterExamples/big-data/cassandracapture
.
- In Administration Service, click Add Extract.
- Enter a Process Name, select Source as Cassandra and select Change Data Capture Extract.
- Provide a name for Extract Trail and select Source Credentials you created for Cassandra.
- Update Managed Options if necessary.
- In the Parameter File, update the fields marked as TODO.
- Create & Run.
- Configure the Extract parameter file:
- Apache Cassandra 4x SDK, compatible with Apache Cassandra 4.0 version
-
Extract parameter file:
-- ggsci> alter credentialstore add user cassandra://db-user@127.0.0.1 password db-passwd alias cass EXTRACT groupname JVMOPTIONS CLASSPATH ggjava/ggjava.jar:DependencyDownloader/dependencies/cassandra_capture_4x/* JVMOPTIONS BOOTOPTIONS -Dcassandra.config=file://{/path/to/apache-cassandra-4.x}/config/casandra.yaml -Dcassandra.datacenter={datacenter-name} TRANLOGOPTIONS CDCREADERSDKVERSION 4x TRANLOGOPTIONS CDCLOGDIRTEMPLATE /path/to/data/cdc_raw SOURCEDB USERIDALIAS cass EXTTRAIL trailprefix TABLE source.*;
- Provide the
cassandra.yaml
file path usingJVMOPTIONS BOOTOPTIONS
.JVMOPTIONS BOOTOPTIONS -Dcassandra.config=file://{/path/to/apache-cassandra-4.x}/config/casandra.yaml -Dcassandra.datacenter={datacenter-name}
Note:
For a remote capture (when the Cassandra server is not on the same machine as Oracle GoldenGate), you need to copy a samplecassandra.yaml
for your Apache Cassandra version onto your GoldenGate Machine and use this path for thiscassandra.yaml
file path configuration. - Configure cassandra datacenter name under
JVMOPTIONS BOOTOPTIONS
. If you do not provide a value, then by default,datacenter1
is considered.
- Provide the
- Apache Cassandra 3x SDK, compatible with Apache Cassandra 3.9, 3.10, 3.11
-
Extract parameter file:
-- ggsci> alter credentialstore add user cassandra://db-user@127.0.0.1 password db-passwd alias cass JVMOPTIONS CLASSPATH ggjava/ggjava.jar:DependencyDownloader/dependencies/cassandra_capture_3x/* TRANLOGOPTIONS CDCREADERSDKVERSION 3x TRANLOGOPTIONS CDCLOGDIRTEMPLATE /path/to/data/cdc_raw SOURCEDB USERIDALIAS cass EXTTRAIL trailprefix TABLE source.*;
- DSE Cassandra SDK, compatible with DSE Cassandra 6.x versions
-
Extract parameter file
-- ggsci> alter credentialstore add user cassandra://db-user@127.0.0.1 password db-passwd alias cass EXTRACT groupname JVMOPTIONS CLASSPATH ggjava/ggjava.jar:{/path/to/dse-6.x}/resources/cassandra/lib/*:{/path/to/dse-6.x}/lib/*:{/path/to/dse-6.x}/resources/dse/lib/*:DependencyDownloader/dependencies/cassandra_capture_dse/* JVMOPTIONS BOOTOPTIONS -Dcassandra.config=file://{/path/to/dse-6.x}/resources/cassandra/conf/casandra.yaml -Dcassandra.datacenter={datacenter-name} TRANLOGOPTIONS CDCREADERSDKVERSION dse TRANLOGOPTIONS CDCLOGDIRTEMPLATE /path/to/data/cdc_raw SOURCEDB USERIDALIAS cass EXTTRAIL trailprefix TABLE source.*;
- Provide the
cassandra.yaml
file path usingJVMOPTIONS BOOTOPTIONS
:JVMOPTIONS BOOTOPTIONS -Dcassandra.config=file://{/path/to/dse-6.x}/resources/cassandra/conf/casandra.yaml -Dcassandra.datacenter={datacenter-name}
Note:
For a remote capture (when the Cassandra server is not on the same machine as Oracle GoldenGate), you need to copy a samplecassandra.yaml
for your DSE version onto your GoldenGate Machine and use this path for thiscassandra.yaml
file path configuration. - Configure cassandra datacenter name under
JVMOPTIONS BOOTOPTIONS
. If you do not provide a value, then by default,Cassandra
is considered.
- Provide the
Note:
For DSE 5.x version, configure the extract with Apache 3x SDK as explained in the Apache 3x section.
Parent topic: Apache Cassandra
9.1.3.7.1 Handling Schema Evolution
TRANLOGOPTIONS TRACKSCHEMACHANGES
This will enable extract to capture table level DDL changes from the source at runtime.
Enable this to ensure that the table metadata within the trail stays in sync with the source without any downtime.
When TRACKSCHEMACHANGES
is disabled, the capture
process will ABEND
if a DDL change is detected at the source
table.
Note:
This feature is disabled by default. To enable, update the extract prm file as shown in the syntax above.Parent topic: Using Change Data Capture Extract
9.1.3.8 Replicating to RDMBS Targets
You must take additional care when replicating source UPDATE
operations from Cassandra trail files to RDMBS targets. Any source UPDATE
operation appears as an INSERT
record in the Oracle GoldenGate trail file. Replicat may abend when a source UPDATE
operation is applied as an INSERT
operation on the target database.
You have these options:
-
OVERRIDEDUPS
: If you expect that the source database is to contain mostlyINSERT
operations and very fewUPDATE
operations, thenOVERRIDEDUPS
is the recommended option. Replicat can recover from duplicate key errors while replicating the small number of the sourceUPDATE
operations.
-
UPDATEINSERTS
andINSERTMISSINGUPDATES
: Use this configuration if the source database is expected to contain mostlyUPDATE
operations and very fewINSERT
operations. With this configuration, Replicat has fewer missing row errors to recover, which leads to better throughput.
-
No additional configuration is required if the target table can accept duplicate rows or you want to abend Replicat on duplicate rows.
If you configure Replicat to use BATCHSQL
, then there may be
duplicate row or missing row errors in batch mode. Although there is a reduction in
the Replicat throughput due to these errors, Replicat automatically recovers from
these errors. If the source operations are mostly INSERTS
, then
BATCHSQL
is a good option.
Parent topic: Apache Cassandra
9.1.3.9 Partition Update or Insert of Static Columns
When the source Cassandra table has static columns, the static column values can be modified by skipping any clustering key columns that are in the table.
For example:
create table ks_demo_rep1.nls_staticcol
(
teamname text,
manager text static,
location text static,
membername text,
nationality text,
position text,
PRIMARY KEY ((teamname), membername)
)
WITH cdc=true;
insert into ks_demo_rep1.nls_staticcol (teamname, manager, location) VALUES ('Red Bull', 'Christian Horner', '<unknown>
The insert CQL
is missing the clustering key membername
. Such an operation is a partition insert.
Similarly, you could also update a static column with just the partition keys in the WHERE
clause of the CQL
that is a partition update operation. Cassandra Extract cannot write a INSERT
or UPDATE
operation into the trail with missing key columns. It abends on detecting a partition INSERT
or UPDATE
operation.
Parent topic: Apache Cassandra
9.1.3.10 Partition Delete
A Cassandra table may have a primary key composed on one or more partition key columns and clustering key columns. When a DELETE
operation is performed on a Cassandra table by skipping the clustering key columns from the WHERE
clause, it results in a partition delete operation.
For example:
create table ks_demo_rep1.table1
(
col1 ascii, col2 bigint, col3 boolean, col4 int,
PRIMARY KEY((col1, col2), col4)
) with cdc=true;
delete from ks_demo_rep1.table1 where col1 = 'asciival' and col2 = 9876543210; /** skipped clustering key column col4 **/
Cassandra Extract cannot write a DELETE
operation into the trail with missing key columns and abends on detecting a partition DELETE
operation.
Parent topic: Apache Cassandra
9.1.3.11 Security and Authentication
-
Cassandra Extract can connect to a Cassandra cluster using username and password based authentication and SSL authentication.
-
Connection to Kerberos enabled Cassandra clusters is not supported in this release.
Parent topic: Apache Cassandra
9.1.3.11.1 Configuring SSL
To enable SSL, add the SSL parameter to your GLOBALS
file or Extract parameter file. Additionally, a separate configuration is required for the Java and CPP drivers, see CDC Configuration Reference.
SSL configuration for Java driver (GLOBALS file)
JVMBOOTOPTIONS -Djavax.net.ssl.trustStore=/path/to/SSL/truststore.file -Djavax.net.ssl.trustStorePassword=password -Djavax.net.ssl.keyStore=/path/to/SSL/keystore.file -Djavax.net.ssl.keyStorePassword=password
SSL configuration for Java driver (Extract parameter file)
JVMOPTIONS BOOTOPTIONS -Djavax.net.ssl.trustStore=/path/to/SSL/truststore.file -Djavax.net.ssl.trustStorePassword=password -Djavax.net.ssl.keyStore=/path/to/SSL/keystore.file -Djavax.net.ssl.keyStorePassword=password
Note:
The Extract parameter file configuration has a higher precedence.https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/secureSSLIntro.html
Using Apache Cassandra 4x SDK / DSE Cassandra SDK
- Create the
application.conf
file with the following properties and override with appropriate values :datastax-java-driver { advanced.ssl-engine-factory { class = DefaultSslEngineFactory # Whether or not to require validation that the hostname of the server certificate's common # name matches the hostname of the server being connected to. If not set, defaults to true. hostname-validation = false # The locations and passwords used to access truststore and keystore contents. # These properties are optional. If either truststore-path or keystore-path are specified, # the driver builds an SSLContext from these files. If neither option is specified, the # default SSLContext is used, which is based on system property configuration. truststore-path = {path to truststore file} truststore-password = password keystore-path = {path to keystore file} keystore-password = cassandra } }
- Provide path of the directory containing the
application.conf
file underJVMCLASSPATH
as follows:JVMCLASSPATH ggjava/ggjava.jar:DependencyDownloader/dependencies/cassandra_capture_4x/*:/path/to/driver/config
Note:
This is valid only in case of theGLOBALS
file.JVMOPTIONS CLASSPATH ggjava/ggjava.jar:DependencyDownloader/dependencies/cassandra_capture_4x/*:/path/to/driver/config/
For more information, see https://github.com/datastax/java-driver/blob/4.x/core/src/main/resources/reference.conf.
SSL configuration for Cassandra CPP driver
To operate with an SSL configuration, you have to add the following parameter in the Oracle GoldenGate GLOBALS
file or Extract parameter file:
CPPDRIVEROPTIONS SSL PEMPUBLICKEYFILE /path/to/PEM/formatted/public/key/file/cassandra.pem CPPDRIVEROPTIONS SSL PEERCERTVERIFICATIONFLAG 0
This configuration is required to connect to a Cassandra cluster with SSL enabled. Additionally, you need to add these settings to your cassandra.yaml
file:
client_encryption_options:
enabled: true
# If enabled and optional is set to true encrypted and unencrypted connections are handled.
optional: false
keystore: /path/to/keystore
keystore_password: password
require_client_auth: false
The PEM formatted certificates can be generated using these instructions:
https://docs.datastax.com/en/developer/cpp-driver/2.8/topics/security/ssl/
Parent topic: Security and Authentication
9.1.3.12 Cleanup of CDC Commit Log Files
You can use the Cassandra CDC commit log purger program to purge the CDC commit log files that are not in use.
For more information, see How to Run the Purge Utility.
- Cassandra CDC Commit Log Purger
A purge utility for Cassandra Handler to purge the staged CDC commit log files. Cassandra Extract moves the CDC commit log files (located at$CASSANDRA/data/cdc_raw
) on each node to a staging directory for processing.
Parent topic: Apache Cassandra
9.1.3.12.1 Cassandra CDC Commit Log Purger
A purge utility for Cassandra Handler to purge the staged CDC commit log
files. Cassandra Extract moves the CDC commit log files (located at
$CASSANDRA/data/cdc_raw
) on each node to a staging directory for
processing.
cdc_raw
commit log directory is
/path/to/cassandra/home/data/cdc_raw
, the staging directory
is /path/to/cassandra/home/data/cdc_raw/../cdc_raw_staged
. The CDC
commit log purger purges those files, which are inside cdc_raw_staged
based on following logic.
The Purge program scans the oggdir directory
for all the following JSON checkpoint files under
dirchk/<EXTGRP>_casschk.json
. The sample JSON file under
dirchk looks similar to the
following:
{
"start_timestamp": -1,
"sequence_id": 34010434,
"updated_datetime": "2018-04-19 23:24:57.164-0700",
"nodes": [
{ "address": "10.247.136.146", "offset": 0, "id": 0 }
,
{ "address": "10.247.136.142", "file": "CommitLog-6-1524110205398.log", "offset": 33554405, "id": 1524110205398 }
,
{ "address": "10.248.10.24", "file": "CommitLog-6-1524110205399.log", "offset": 33554406, "id": 1524110205399 }
]
}
For each node address in JSON checkpoint file, the purge program captures the CDC file name and ID. For each ID obtained from the JSON checkpoint file, the purge program looks into the staged CDC commit log directory and purges the commit log files with the id that are lesser then the id captured in JSON file of checkpoint.
Example:
In JSON file, we had ID as 1524110205398.
In CDC Staging directory, we have
files as CommitLog-6-1524110205396.log
,
CommitLog-6-1524110205397.log
, and
CommitLog-6-1524110205398.log
.
The ids derived from CDC staging directory are 1524110205396, 1524110205397 and 1524110205398. The purge utility purges the files in CDC staging directory whose IDs are less than the ID read in JSON file, which is 1524110205398. The files associated with the ID 1524110205396 are 524110205397 are purged.
- How to Run the Purge Utility
- Sample config.properties for Local File System
- Argument cassCommitLogPurgerConfFile
- Argument purgeInterval
Setting the optional argumentpurgeInterval
helps in configuring the process to run as a daemon. - Argument cassUnProcessedFilesPurgeInterval
Setting the optional argumentcassUnProcessedFilesPurgeInterval
helps in purging historical commit logs for all the nodes that do not have a last processed file.
Parent topic: Cleanup of CDC Commit Log Files
9.1.3.12.1.1 How to Run the Purge Utility
Parent topic: Cassandra CDC Commit Log Purger
9.1.3.12.1.1.1 Third Party Libraries Needed to Run this Program
<dependency> <groupId>com.jcraft</groupId> <artifactId>jsch</artifactId> <version>0.1.54</version> <scope>provided</scope> </dependency>
Parent topic: How to Run the Purge Utility
9.1.3.12.1.1.2 Command to Run the Program
java -Dlog4j.configurationFile=log4j-purge.properties -Dgg.log.level=INFO -cp <OGG_HOME>/ggjava/resources/lib/*:<OGG_HOME>/thirdparty/cass/jsch-0.1.54.jar oracle.goldengate.cassandra.commitlogpurger.CassandraCommitLogPurger --cassCommitLogPurgerConfFile <OGG_HOME>/cassandraPurgeUtil/commitlogpurger.properties --purgeInterval 1 --cassUnProcessedFilesPurgeInterval 3
<OGG_HOME>/ggjava/resources/lib/*
is the directory where the purger utility is located.<OGG_HOME>/thirdparty/cass/jsch-0.1.54.jar
is the dependent jar to execute the purger program.---cassCommitLogPurgerConfFile
,--purgeInterval
and--cassUnProcessedFilesPurgeInterval
are run time arguments.
Sample script to run the commit log purger utility:
#!/bin/bash echo "fileSystemType=remote" > commitlogpurger.properties echo "chkDir=dirchk" >> commitlogpurger.properties echo "cdcStagingDir=data/cdc_raw_staged" >> commitlogpurger.properties echo "userName=username" >> commitlogpurger.properties echo "password=password" >> commitlogpurger.properties java -cp ogghome/ggjava/resources/lib/*:ogghome/thirdparty/cass/jsch-0.1.54.jar oracle.goldengate.cassandra.commitlogpurger.CassandraCommitLogPurger --cassCommitLogPurgerConfFile commitlogpurger.properties --purgeInterval 1 --cassUnProcessedFilesPurgeInterval 3
Parent topic: How to Run the Purge Utility
9.1.3.12.1.1.3 Runtime Arguments
To execute, the utility class
CassandraCommitLogPurger
requires a
mandatory run-time argument
cassCommitLogPurgerConfFile
.
Available Runtime arguments to
CassandraCommitLogPurger
class are:
[required] --cassCommitLogPurgerConfFile path to config.properties
[optional] --purgeInterval
[optional] --cassUnProcessedFilesPurgeInterval
Parent topic: How to Run the Purge Utility
9.1.3.12.1.2 Sample config.properties for Local File System
fileSystemType=local chkDir=apache-cassandra-3.11.2/data/chkdir/ cdcStagingDir=apache-cassandra-3.11.2/data/$nodeAddress/commitlog/
Parent topic: Cassandra CDC Commit Log Purger
9.1.3.12.1.3 Argument cassCommitLogPurgerConfFile
cassCommitLogPurgerConfFile
argument
takes the config file with following mandate fields.
Table 9-1 Argument cassCommitLogPurgerConfFile
Parameters | Description |
---|---|
fileSystemType | Default: local
Mandatory: Yes Legal Values: remote/ local Description: In every live node in the
cluster, CDC Staged Commit logs can be accessed
through SFTP or NFS. If the
|
chkDir | Default: None
Mandatory: Yes Legal Values: checkpoint directory pathDescription: Location of
Cassandra checkpoint directory where
|
cdcStagingDir | Default: None
Mandatory: Yes Legal Values: staging directory pathDescription: Location of
Cassandra staging directory where CDC commit logs
are present. For example,
|
userName | Default: None
Mandatory: No Legal Values: Valid SFTP auth usernameDescription: SFTP User name to connect to the server. |
password | Default: None
Mandatory: No Legal Values: Valid SFTP auth passwordDescription: SFTP password to connect to the server. |
port | Default: 22
Mandatory: No Legal Values: Valid SFTP auth portDescription: SFTP port number |
privateKey |
Default: None Mandatory: No Legal Values: valid path to the privateKey fileDescription: The private key
is used to perform the authentication, allowing
you to log in without having to specify a
password. Providing the
|
passPhase |
Default: None Mandatory: No Legal Values: valid password for privateKeyDescription: The private key is
typically password protected. If it is provided,
then the |
Parent topic: Cassandra CDC Commit Log Purger
9.1.3.12.1.3.1 Sample config.properties for Local File System
fileSystemType=local chkDir=apache-cassandra-3.11.2/data/chkdir/ cdcStagingDir=apache-cassandra-3.11.2/data/$nodeAddress/commitlog/
Parent topic: Argument cassCommitLogPurgerConfFile
9.1.3.12.1.3.2 Sample config.properties for Remote File System
fileSystemType=remote chkDir=apache-cassandra-3.11.2/data/chkdir/ cdcStagingDir=apache-cassandra-3.11.2/data/$nodeAddress/commitlog/ username=username password=@@@@@ port=22
Parent topic: Argument cassCommitLogPurgerConfFile
9.1.3.12.1.4 Argument purgeInterval
Setting the optional argument purgeInterval
helps in
configuring the process to run as a daemon.
This argument is an integer value representing the time period of
clean-up to happen. For example, if purgeInterval
is set to 1, then
the process runs every day on the time the process started.
Parent topic: Cassandra CDC Commit Log Purger
9.1.3.12.1.5 Argument cassUnProcessedFilesPurgeInterval
Setting the optional argument
cassUnProcessedFilesPurgeInterval
helps in purging historical commit
logs for all the nodes that do not have a last processed file.
cassUnProcessedFilesPurgeInterval
is not set, then
the default value is configured to 2 days; the files older than 2 days or as per the
configured value days, and the commit log files are purged. The
CassandraCommitLogPurger
Utility can't purge files that are older
than a day. It should be either the default 2 days or more than that.
{ "start_timestamp": -1, "sequence_id": 34010434, "updated_datetime": "2018-04-19 23:24:57.164-0700", "nodes": [ { "address": "10.247.136.146", "offset": 0, "id": 0 } , { "address": "10.247.136.142", "file": "CommitLog-6-1524110205398.log", "offset": 33554405, "id": 1524110205398 } , { "address": "10.248.10.24", "file": "CommitLog-6-1524110205399.log", "offset": 33554406, "id": 1524110205399 } , { "address": "10.248.10.25", "offset": 0, "id": 0 } , { "address": "10.248.10.26", "offset": 0, "id": 0 } ] }
In
this example, the Cassandra nodes addresses 10.248.10.25
and
10.248.10.26
do not have a last processed file. The commit log
files in those nodes will be purged as per the configured days of
cassUnProcessedFilesPurgeInterval
argument value.
Note:
The last processing file may not be available due to the following reasons:- A new node was added into the cluster and no commit log files were processed through Cassandra extract yet.
- All the commit log files processed from this node does not contain operation data as per the table wildcard match.
- All the commit log files processed from this node contain operation records that were not written to the trail file due to de-duplication.
Parent topic: Cassandra CDC Commit Log Purger
9.1.3.13 Multiple Extract Support
Multiple Extract groups in a single Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) installation can be configured to connect to the same Cassandra cluster.
To run multiple Extract groups:
- One (and only one) Extract group can be configured to move the commit log files in
the
cdc_raw
directory on the Cassandra nodes to a staging directory. Themovecommitlogstostagingdir
parameter is enabled by default and no additional configuration is required for this Extract group. - All the other Extract groups should be configured with the
nomovecommitlogstostagingdir
parameter in the Extract parameter (.prm
) file.
Parent topic: Apache Cassandra
9.1.3.14 CDC Configuration Reference
The following properties are used with Cassandra change data capture.
Properties | Required/Optional | Location | Default | Explanation |
---|---|---|---|---|
|
Optional |
Extract parameter ( |
|
Use only during initial load process. When set to |
|
Optional |
Extract parameter ( |
Minimum is Maximum is |
Use only during initial load process. Specifies the number of rows of data the driver attempts to fetch on each request submitted to the database server. The parameter value should be lower than the database configuration parameter, Oracle recommends that you set this parameter value to 5000 for initial load Extract optimum performance. |
|
Required |
Extract parameter ( |
None |
The CDC commit log directory path template. The template can optionally have the |
|
Optional |
Extract parameter ( |
None |
The secure file transfer protocol (SFTP) connection details to pull and transfer the commit log files. You can use one or more of these options:
|
|
Optional |
GLOBALS file
Note: Starting from Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 23ai, this parameter will be deprecated. |
|
A comma separated list of nodes to be used for a connection to the Cassandra cluster. You should provide at least one node address. The parameter options are: |
|
Optional |
Extract parameter ( |
|
The SDK Version for the CDC reader capture API. |
|
Optional |
Extract parameter ( |
|
When set to |
|
Optional |
Extract parameter ( |
|
Purge CDC commit log files post extract processing. When the value is set to
|
JVMOPTIONS [CLASSPATH <classpath>
| BOOTOPTIONS <options>] |
Mandatory |
Extract parameter ( |
None |
|
|
Optional |
GLOBALS file
Note: Starting from Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 23ai, this parameter will be deprecated. |
None |
The boot options for the Java Virtual Machine. Multiple options are delimited by a space character. |
|
Required |
GLOBALS file
Note: Starting from Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 23ai, this parameter will be deprecated. |
None |
The classpath for the Java Virtual Machine. You can include an asterisk ( |
|
Required |
GLOBALS file
Note: Starting from Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 23ai, this parameter will be deprecated. |
None |
The source database for CDC capture or database queries. The valid value is |
|
Required |
Extract parameter ( |
None |
A single Cassandra node address that is used for a connection to the Cassandra cluster and to query the metadata for the captured tables. |
|
Optional |
Extract parameter ( |
|
If this value is |
|
Optional |
Extract parameter ( |
|
If this value is |
|
Optional |
Extract parameter ( |
|
Enabled by default and this instructs the Extract group to move the commit log files in the |
|
Optional |
GLOBALS or Extract parameter ( |
|
Use for basic SSL support during connection. Additional JSSE configuration through Java System properties is expected when enabling this. Note: The following SSL properties are in |
|
Optional |
GLOBALS or Extract parameter ( String that indicates the absolute path with fully qualified name. This file is must for the SSL connection. |
None, unless the |
Indicates that it is PEM formatted public key file used to verify the peer's certificate. This property is needed for one-way handshake or basic SSL connection. |
|
Optional |
GLOBALS or Extract parameter ( |
|
Enabled indicates a two-way SSL encryption between client and server. It is required to authenticate both the client and the server through PEM formatted certificates. This property also needs the |
|
Optional |
GLOBALS or Extract parameter ( String that indicates the absolute path with fully qualified name. This file is must for the SSL connection. |
None, unless the |
Use for a PEM formatted public key file name used to verify the client's certificate. This is must if you are using |
|
Optional |
GLOBALS or Extract parameter ( String that indicates the absolute path with fully qualified name. This file is must for the SSL connection. |
None, unless the |
Use for a PEM formatted private key file name used to verify the client's certificate. This is must if you are using |
|
Optional |
GLOBALS or Extract parameter ( A string |
None, unless the |
Sets the password for the PEM formatted private key file used to verify the client's certificate. This is must if the private key file is protected with the password. |
|
Optional |
GLOBALS or Extract parameter ( An integer |
|
Sets the verification required on the peer's certificate. The range is 0–4: 0–Disable certificate identity verification. 1–Verify the peer certificate 2–Verify the peer identity 3– Not used so it is similar to disable certificate identity verification. 4 –Verify the peer identity by its domain name |
|
Optional |
GLOBALS or Extract parameter ( |
|
Enables retrieving host name for IP addresses using reverse IP lookup. |
TRANLOGOPTIONS
TRACKSCHEMACHANGES |
Optional | Extract parameter (.prm )
file.
|
By default, the property is disabled. |
This will enable extract to capture table level DDL changes from the source at runtime. Enable this to ensure that the table metadata within the trail
stays in sync with the source without any downtime. When
|
Parent topic: Apache Cassandra
9.1.3.15 Troubleshooting
No data captured by the Cassandra Extract process.
-
The Cassandra database has not flushed the data from the active commit log files to the CDC commit log files. The flush is dependent on the load of the Cassandra cluster.
-
The Cassandra Extract captures data from the CDC commit log files only.
-
Check the CDC property of the source table. The CDC property of the source table should be set to
true
. -
Data is not captured if the
TRANLOGOPTIONS CDCREADERSDKVERSION
3.9 parameter is in use and theJVMCLASSPATH
is configured to point to Cassandra 3.10 or 3.11 JAR files.
Error: OGG-01115 Function getInstance not implemented.
-
The following line is missing from the GLOBALS file.
OGGSOURCE CASSANDRA
Error: Unable to connect to Cassandra cluster, Exception: com.datastax.driver.core.exceptions.NoHostAvailableException
This indicates that the connection to the Cassandra cluster was unsuccessful.
Check the following parameters:
CLUSTERCONTACTPOINTS
Error: Exception in thread "main" java.lang.NoClassDefFoundError: oracle/goldengate/capture/cassandra/CassandraCDCProcessManager
Check the JVMOPTIONS CLASSPATH
parameter in the GLOBALS
file.
Error: oracle.goldengate.util.Util - Unable to invoke method while constructing object. Unable to create object of class "oracle.goldengate.capture.cassandracapture311.SchemaLoader3DOT11" Caused by: java.lang.NoSuchMethodError: org.apache.cassandra.config.DatabaseDescriptor.clientInitialization()V
There is a mismatch in the Cassandra SDK version configuration. The
TRANLOGOPTIONS CDCREADERSDKVERSION 3.11
parameter is in use and
the JVMCLASSPATH
may have the Cassandra 3.9 JAR file path.
Error: OGG-25171 Trail file '/path/to/trail/gg' is remote. Only local trail allowed for this extract.
A Cassandra Extract should only be configured to write to local trail
files. When adding trail files for Cassandra Extract, use the
EXTTRAIL
option. For example:
ADD EXTTRAIL ./dirdat/z1, EXTRACT cass
Errors: OGG-868 error message or OGG-4510 error message
The cause could be any of the following:
-
Unknown user or invalid password
-
Unknown node address
-
Insufficient memory
Another cause could be that the connection to the Cassandra database is broken. The error message indicates the database error that has occurred.
Error: OGG-251712 Keyspace keyspacename does not exist in the database.
The issue could be due to these conditions:
-
During the Extract initial load process, you may have deleted the
KEYSPACE keyspacename
from the Cassandra database. -
The
KEYSPACE keyspacename
does not exist in the Cassandra database.
Error: OGG-25175 Unexpected error while fetching row.
This can occur if the connection to the Cassandra database is broken during initial load process.
Error: “Server-side warning: Read 915936 live rows and 12823104 tombstone cells for query SELECT * FROM keyspace.table(see tombstone_warn_threshold)”.
When the value of the initial load DBOPTIONS
FETCHBATCHSIZE
parameter is greater than the Cassandra database
configuration parameter,tombstone_warn_threshold
, this is likely to
occur.
Increase the value of tombstone_warn_threshold
or
reduce the DBOPTIONS FETCHBATCHSIZE
value to get around this
issue.
Duplicate records in the Cassandra Extract trail.
Internal tests on a multi-node Cassandra cluster have revealed that there is a possibility of duplicate records in the Cassandra CDC commit log files. The duplication in the Cassandra commit log files is more common when there is heavy write parallelism, write errors on nodes, and multiple retry attempts on the Cassandra nodes. In these cases, it is expected that Cassandra trail file will have duplicate records.
JSchException or SftpException in the Extract Report File
Verify that the SFTP credentials (user
,
password
, and privatekey
) are correct. Check
that the SFTP user has read and write permissions for the cdc_raw
directory on each of the nodes in the Cassandra cluster.
ERROR o.g.c.c.CassandraCDCProcessManager - Exception during creation of CDC staging directory [{}]java.nio.file.AccessDeniedException
The Extract process does not have permission to create CDC commit log
staging directory. For example, if the cdc_raw
commit log directory
is /path/to/cassandra/home/data/cdc_raw
, then the staging directory
would be
/path/to/cassandra/home/data/cdc_raw/../cdc_raw_staged
.
Extract report file shows a lot of DEBUG log statements
On production system, you do not need to enable debug logging. To use
INFO
level logging, make sure that the Extract parameter file
include this
JVMBOOTOPTIONS -Dlogback.configurationFile=AdapterExamples/big-data/cassandracapture/logback.xml
To enable SSL in Oracle Golden Gate Cassandra Extract you have to enable SSL in the GLOBALS file or in the Extract Parameter file.
If SSL Keyword is missing, then Extract assumes that you wanted to
connect without SSL. So if the Cassandra.yaml
file has an SSL
configuration entry, then the connection fails.
SSL is enabled and it is one-way handshake
You must specify the
CPPDRIVEROPTIONS SSL PEMPUBLICKEYFILE
/scratch/testcassandra/testssl/ssl/cassandra.pem
property.
If this property is missing, then Extract generates this error:.
2018-06-09 01:55:37 ERROR OGG-25180 The PEM formatted public key file used to verify the peer's certificate is missing.If SSL is enabled, then it is must to set PEMPUBLICKEYFILE in your Oracle GoldenGate GLOBALS file or in Extract parameter file
SSL is enabled and it is two-way handshake
You must specify these properties for SSL two-way handshake:
CPPDRIVEROPTIONS SSL ENABLECLIENTAUTH CPPDRIVEROPTIONS SSL PEMCLIENTPUBLICKEYFILE /scratch/testcassandra/testssl/ssl/datastax-cppdriver.pem CPPDRIVEROPTIONS SSL PEMCLIENTPRIVATEKEYFILE /scratch/testcassandra/testssl/ssl/datastax-cppdriver-private.pem CPPDRIVEROPTIONS SSL PEMCLIENTPRIVATEKEYPASSWD cassandra
Additionally, consider the following:
-
If
ENABLECLIENTAUTH
is missing then Extract assumes that it is one-way handshake so it ignoresPEMCLIENTPRIVATEKEYFILE
andPEMCLIENTPRIVATEKEYFILE
. The following error occurs because thecassandra.yaml
file should haverequire_client_auth
set totrue
.2018-06-09 02:00:35 ERROR OGG-00868 No hosts available for the control connection.
-
If
ENABLECLIENTAUTH
is used andPEMCLIENTPRIVATEKEYFILE
is missing, then this error occurs:2018-06-09 02:04:46 ERROR OGG-25178 The PEM formatted private key file used to verify the client's certificate is missing. For two way handshake or if ENABLECLIENTAUTH is set, then it is mandatory to set PEMCLIENTPRIVATEKEYFILE in your Oracle GoldenGate GLOBALS file or in Extract parameter file.
-
If
ENABLECLIENTAUTH
is use andPEMCLIENTPUBLICKEYFILE
is missing, then this error occurs:2018-06-09 02:06:20 ERROR OGG-25179 The PEM formatted public key file used to verify the client's certificate is missing. For two way handshake or if ENABLECLIENTAUTH is set, then it is mandatory to set PEMCLIENTPUBLICKEYFILE in your Oracle GoldenGate GLOBALS file or in Extract parameter file.
-
If the password is set while generating the client private key file then you must add
PEMCLIENTPRIVATEKEYPASSWD
to avoid this error:2018-06-09 02:09:48 ERROR OGG-25177 The SSL certificate: /scratch/jitiwari/testcassandra/testssl/ssl/datastax-cppdriver-private.pem can not be loaded. Unable to load private key.
-
If any of the PEM file is missing from the specified absolute path, then this error occurs:
2018-06-09 02:12:39 ERROR OGG-25176 Can not open the SSL certificate: /scratch/jitiwari/testcassandra/testssl/ssl/cassandra.pem.
com.jcraft.jsch.JSchException: UnknownHostKey
If the extract process ABENDs with this issue, then it is likely that some or all the
Cassandra node addresses are missing in the SSH known-hosts
file.
For more information, see Setup SSH Connection to the Cassandra Nodes.
General SSL Errors
Consider these general errors:
-
The SSL connection may fail if you have enabled all SSL required parameters in Extract or GLOBALS file and the SSL is not configured in the
cassandra.yaml
file. -
The absolute path or the qualified name of the PEM file may not correct. There could be access issue on the PEM file stored location.
-
The password added during generating the client private key file may not be correct or you may not have enabled it in the Extract parameter or GLOBALS file.
Parent topic: Apache Cassandra
9.1.3.16 Cassandra Capture Client Dependencies
What are the dependencies for the Cassandra Capture (Extract) to connect to Apache Cassandra databases?
The following third party libraries are needed to run Cassandra Change Data Capture.
Capturing from Apache Cassandra 3.x versions:
- cassandra-driver-core (com.datastax.cassandra) version 3.3.1
- cassandra-all (org.apache.cassandra) version 3.11.0
- gson (com.google.code.gson) version 2.8.0
- jsch (com.jcraft) version 0.1.54
- java-driver-core (com.datastax.oss) version 4.14.1
- cassandra-all (org.apache.cassandra) version 4.0.5
- gson (com.google.code.gson) version 2.8.0
- jsch (com.jcraft) version 0.1.54
You can use the Dependency Downloader scripts to download the Datastax Java Driver and its associated dependencies. For more information, see Dependency Downloader.
Parent topic: Apache Cassandra