9.2.22 Google Cloud Storage
Topics:
- Overview
- Prerequisites
- Buckets and Objects
- Authentication and Authorization
- Configuration
- Troubleshooting and Diagnostics
Parent topic: Target
9.2.22.1 Overview
You can use the GCS Event handler to load files generated by the File Writer handler into GCS.
Parent topic: Google Cloud Storage
9.2.22.2 Prerequisites
- Google Cloud Platform (GCP) account set up.
- Google service account key with the relevant permissions.
- GCS Java Software Developement Kit (SDK)
Parent topic: Google Cloud Storage
9.2.22.3 Buckets and Objects
Parent topic: Google Cloud Storage
9.2.22.4 Authentication and Authorization
You need to create a service account key with the relevant Identity and Access Management (IAM) permissions.
Use the JSON key type to generate the service account key file.
You can either set the path to the service account key file in
the environment variable GOOGLE_APPLICATION_CREDENTIALS
or in the GCS Event
handler property gg.eventhandler.name.credentialsFile
. You can also specify
the individual keys of credentials file like clientId
,
clientEmail
, privateKeyId
and
privateKey
into corresponding handler properties instead of specifying
the credentials file path directly. This enables the credential keys to be encrypted using
Oracle wallet.
The following are the IAM permissions to be added into the service account used to run GCS Event handler.
Parent topic: Google Cloud Storage
9.2.22.4.1 Bucket Permissions
Table 9-28 Bucket Permissions
Bucket Permission Name | Description |
---|---|
storage.buckets.create |
Create new buckets in a project. |
storage.buckets.delete |
Delete buckets. |
storage.buckets.get |
Read bucket metadata, excluding IAM policies. |
storage.buckets.list |
List buckets in a project. Also read bucket metadata, excluding IAM policies, when listing. |
storage.buckets.update |
Update bucket metadata, excluding IAM policies. |
Parent topic: Authentication and Authorization
9.2.22.4.2 Object Permissions
Table 9-29 Object Permissions
Object Permission Name | Description |
---|---|
storage.objects.create |
Add new objects to a bucket. |
storage.objects.delete |
Delete objects. |
storage.objects.get |
Read object data and metadata, excluding ACLs. |
storage.objects.list |
List objects in a bucket. Also read object metadata, excluding ACLs, when listing. |
storage.objects.update |
Update object metadata, excluding ACLs. |
Parent topic: Authentication and Authorization
9.2.22.5 Configuration
Table 9-30 Object Permissions
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.name.type |
Required | gcs |
None | Selects the GCS Event Handler for use with File Writer handler. |
gg.eventhandler.name.location |
Optional | A valid GCS location. | None | If the GCS bucket does not exist, a new bucket will be created in this GCS location. If location is not specified, new bucket creation will fail. GCS location reference:GCS locations. |
gg.eventhandler.name.bucketMappingTemplate |
Required | A string with resolvable keywords and constants used to dynamically generate a GCS bucket name. | None | A GCS bucket is created by the GCS Event handler if it does not exist using this name. See Bucket Naming GuidelinesFor more information about supported keywords, see Template Keywords . |
gg.eventhandler.name.pathMappingTemplate |
Required | A string with resolvable keywords and constants used to dynamically generate the path in the GCS bucket to write the file. | None | Use keywords interlaced with constants to dynamically generate a
unique GCS path names at runtime. Example path name:
ogg/data/${groupName}/${fullyQualifiedTableName} . For more
information about supported keywords, see Template Keywords .
|
gg.eventhandler.name.fileNameMappingTemplate |
Optional | A string with resolvable keywords and constants used to dynamically generate a file name for the GCS object. | None | Use resolvable keywords and constants used to dynamically generate the GCS object file name. If not set, the upstream file name is used. For more information about supported keywords, see Template Keywords |
gg.eventhandler.name.finalizeAction |
Optional | A unique string identifier cross referencing a child event handler. | No event handler configured. | Sets the downstream event handler that is invoked on the file roll event. A typical example would be use a downstream to load the GCS data into Google BigQuery using the BigQuery Event handler. |
gg.eventhandler.name.credentialsFile |
Optional | Relative or absolute path to the service account key file. | Noe | Sets the path to the service account key file. Alternatively, if
the environment variable GOOGLE_APPLICATION_CREDENTIALS is set to
the path to the service account key file, then you need not set this parameter.
|
gg.eventhandler.name.storageClass |
Optional | STANDARD|NEARLINE |COLDLINE|ARCHIVE|
REGIONAL|MULTI_REGIONAL| DURABLE_REDUCED_AVAILABILITY |
None | The storage class you set for an object affects the object’s availability and pricing model. If this property is not set, then the storage class for the file is set to the default storage class for the respective bucket. If the bucket does not exist and storage class is specified, then a new bucket is created with this storage class as its default. |
gg.eventhandler.name.kmsKey |
Optional | Key names in the format:
projects/<PROJECT>/locations/<LOCATION>/keyRings/<RING_NAME>/cryptoKeys/<KEY_NAME> .
<PROJECT> : Google project-id.
<LOCATION> : Location of the GCS bucket.
<RING_NAME> : Google Cloud KMS key ring
name. <KEY_NAME> : Google Cloud KMS key
name.
|
None | Google Cloud Storage always encrypts your data on the server
side, before it is written to disk using Google-managed encryption keys. As an
additional layer of security, customers may choose to use keys generated by Google
Cloud Key Management Service (KMS). This property can be used to set a customer
managed Cloud KMS key to encrypt GCS objects. When using customer managed keys,
the gg.eventhandler.name.concurrency property cannot be set to a
value greater than one because with customer managed keys GCP does not allow
multi-part uploads using object composition.
|
gg.eventhandler.name.concurrency |
Optional | Any number in the range 1 to 32. | 10 |
If concurrency is set to a value greater than one, then the GCS
Event handler performs multi-part uploads using composition. The multi-part
uploads spawn concurrent threads to upload each part. The individual parts are
uploaded to the following directory
<bucketMappingTemplate>/oggtmp . This directory
is reserved for use by Oracle GoldenGate for Distributed Applications and
Analytics (GG for DAA). This provides better throughput rates for uploading large
files. Multi-part uploads are used for files with size greater than 10 mega
bytes.
|
gg.eventhandler.gcs.clientId |
Optional | Valid Big Query Credentials Client Id | NA | Provides the client ID key from the credentials file for connecting to Google Big Query service account. |
gg.eventhandler.gcs.clientEmail |
Optional | Valid Big Query Credentials Client Email | NA | Provides the client Email key from the credentials file for connecting to Google Big Query service account. |
gg.eventhandler.gcs.privateKeyId |
Optional | Valid Big Query Credentials Client Email | NA | Provides the client Email key from the credentials file for connecting to Google Big Query service account. |
gg.eventhandler.gcs.privateKey |
Optional | Valid Big Query Credentials Private Key. | NA | Provides the Private Key from the credentials file for connecting to Google Big Query service account. |
gg.eventhandler.name.projectId |
Optional | The Google project-id | project-id associated
with the service account.
|
NA | Sets the project-id of the Google Cloud project
that houses the storage bucket. Auto configure will automatically configure this
property by accessing the service account key file unless user wants to override
this explicitly.
|
gg.eventhandler.name.url |
Optional | A legal URL to connect to Google Cloud Storage including scheme, server name and port (if not the default port). The default is https://storage.googleapis.com. | https://storage.googleapis.com | Allows the user to set a URL for a private endpoint to connect to GCS. |
Note:
To be able to connect GCS to the Google Cloud Service account, ensure that either of the following is configured: the credentials file property with the relative or absolute path to credentials JSON file or the properties for individual credentials keys. The configuration property to individually add google service account credential key enables them to encrypt using the Oracle wallet.9.2.22.5.1 Classpath Configuration
The GCS Event handler uses the Java SDK for Google Cloud Storage. The classpath must include the path to the GCS SDK.
Parent topic: Configuration
9.2.22.5.1.1 Dependencies
<dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-storage</artifactId> <version>1.113.9</version> </dependency>
Alternatively, you can download the GCS dependencies by running the script:
<OGGDIR>/DependencyDownloader/gcs.sh
.
Edit the gg.classpath
configuration parameter to include the path to the
GCS SDK.
Parent topic: Classpath Configuration
9.2.22.5.2 Proxy Configuration
jvm.bootoptions
property to set proxy server configuration.
For Example:
jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com
-Dhttps.proxyPort=80
Parent topic: Configuration
9.2.22.5.3 Sample Configuration
#The GCS Event handler gg.eventhandler.gcs.type=gcs gg.eventhandler.gcs.pathMappingTemplate=${fullyQualifiedTableName} #TODO: Edit the GCS bucket name gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name> #TODO: Edit the GCS credentialsFile gg.eventhandler.gcs.credentialsFile=/path/to/gcs/credentials-file gg.eventhandler.gcs.finalizeAction=none gg.classpath=/path/to/gcs-deps/* jvm.bootoptions=-Xmx8g -Xms8g
Parent topic: Configuration
9.2.22.6 Troubleshooting and Diagnostics
Duplicate records after Replicat Recovery
Google Cloud Storage (GCS) handler replication uses File Writer Handler
and GCS handler in the replicat. Oracle GoldenGate prioritizes no data loss and
guarantees no data loss in case of failures by at least once semantics in GCS
(json
, csv
, delimtedtext
,
avro_orc
, parquet
) delivery. In the cases if
replicat runs fine and normally shut down, then exactly once is supported. In case
of failures (because of network failures), there are various reason that can lead
into duplicates in recovery.
Two cases where duplicates can occur:
- If data is written and a failure occurs between when the data is written, and when the checkpoint is moved. Then upon restart the replicat backs up to the previous checkpoint and data can unfortunately be replayed.
- The rolling of the data files occurs based on customer configured triggers. Trigger can be file size, time, inactivity, or time of day. The rolling does not necessarily happen on a transaction commit boundary. The trigger causes writing to the current file to complete, the post processing transformation and movement complete, and any state on that file is deleted. If a replicat abend occurs in between when the rolling is processed and when the checkpoint is moved, then upon restart, it can again replay those messages.
If you observe duplicate records in case of GCS replicat recovery, then it is an expected behavior. If you observe duplicates while replicat is running fine, then file a support ticket.
Parent topic: Google Cloud Storage