7.3 Google Cloud Storage Replication
Google Cloud Storage (GCS) is a service for storing objects in Google Cloud Platform.
You can use GoldenGate for Big Data to ingest different file formats into GCS. Oracle GoldenGate for Big Data supports the following file formats:
delimited-text.json
json
json_row
json_op
avro_row
avro_op
avro_row_ocf
avro_op_ocf
parquet
Ensure that the files are in a closed state to load them to GCS. For more information about how to control the File Writer behaviour, see the File Writer Behaviour blog.
This quick start will load using the default settings.
Parent topic: Quickstarts
7.3.1 Install Dependency Files
Oracle GoldenGate for Big Data uses client libraries in the replication process. You need to download these libraries by using the Dependency Downloader utility available in Oracle GoldenGate for Big Data before setting up the replication process. Dependency downloader is a set of shell scripts that downloads dependency jar files from Maven and other repositories.
To install the required dependency files:
- Go to installation location of Dependency Downloader:
GG_HOME/opt/DependencyDownloader/
. - Run gcs.sh and bigquery.sh with the required version.
Figure 7-10 Run gcs.sh and bigquery.sh with the required versions
GG_HOME/opt/DependencyDownloader/dependencies
. For example,/u01/app/ogg/opt/DependencyDownloader/dependencies/gcs_1.113.9
Parent topic: Google Cloud Storage Replication
7.3.2 Create a Replicat in Oracle GoldenGate for Big Data
To create a replicat in Oracle GoldenGate for Big Data:
- In the Oracle GoldenGate for Big Data UI, in the Administration
Service tab, click the + sign to add a replicat.
Figure 7-11 Click + in the Administration Service tab.
Figure 7-12 Click + sign to add a replicat
- Select the Replicat Type and click Next.
There are two different Replicat types here: Classic and Coordinated. Classic Replicat is a single-threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel.
Figure 7-13 Select the Replicat Type and click Next.
- Enter the basic information, and click Next:
- Process Name: Name of the Replicat
- Trail Name: Name of the required trail file. You can use the sample trail file tr which is shipped with Oracle GoldenGate for Big Data.
- Trail Subdirectory: Sets the path to trail file. Sample trail file
tr
is located atOGG_HOME/opt/AdapterExamples/trail
. - Target: Google Cloud Storage
Figure 7-14 Process Name, Trail Name, and Target Names
- Enter Parameter File details and click Next. In the
Parameter File, you can either specify source to target mapping or leave it
as-is with a wildcard selection. If Co-ordinated Replicat is selected as
the Replicat Type, then you need to provide an additional parameter:
TARGETDB LIBFILE libggjava.so SET property=<ggbd-deployment_home>/etc/conf/ogg/your_replicat_name.properties
Figure 7-15 Provide Parameter File details and click Next.
- In the next screen, update the properties only tagged as
TODO
. They are as follows: Provide your GCS bucket name:#TODO: Edit the GCS bucket name gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
Provide path to your GCP service account key:#TODO: Edit the GCS credentialsFile gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile
Provide path to dependency jar files that you downloaded in prerequisites:#TODO: Edit to include the GCS Java SDK and BQ Java SDK. gg.classpath=/path/to/gcs-deps/*:/path/to/bq-deps/*
Without these properties, your replicat will fail. There are also some optional properties that you can modify:
gg.handler.filewriter.formatcontrols the format of the output files. By default, it is set to avro_row_ocf. You can change into json, delimitedtext or one of the other Configuring the File Writer Handler.
gg.handler.filewriter.fileRollInterval
andgg.handler.filewriter.inactivityRollInterval
controls the file behaviour. A file should be in a closed state to be loaded into GCS buckets.fileRollInterval starts a timer when file is created and when it is reached, file will be moved to a closed state and moved to GCS bucket. In replicat properties, it is set to 0 which means that it is off. You can set it to 5s(5 seconds) for this quick start.
inactivityRollIntervaltracks the inactivity period. Here, inactivity means there are no operations coming from the source system. You can set it to 5s (5 seconds) for this quick start.
Figure 7-16 Add Replicat
- If replicat starts successfully, then it will be in running state.
Go to action/details/statistics to see the replication statistics:
Figure 7-17 Replication Statistics
- Go to GCP Cloud Storage bucket and check the table.
Figure 7-18 Bucket Details
Parent topic: Google Cloud Storage Replication