9.2.21 Google BigQuery
Topics:
Parent topic: Target
9.2.21.1 BigQuery Streaming Handler
- BigQuery Streaming Handler with Legacy Streaming API
Learn how to use the Google BigQuery Handler, which streams change data capture data from source trail files into Google BigQuery. - BigQuery Streaming API with Storage Write API
Parent topic: Google BigQuery
9.2.21.1.1 BigQuery Streaming Handler with Legacy Streaming API
Learn how to use the Google BigQuery Handler, which streams change data capture data from source trail files into Google BigQuery.
BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage, see https://cloud.google.com/bigquery/.
- Detailing the Functionality
- Setting Up and Running the BigQuery Handler
The Google BigQuery Handler uses the Java BigQuery client libraries to connect to Big Query. - Google BigQuery Dependencies
The Google BigQuery client libraries are required for integration with BigQuery.
Parent topic: BigQuery Streaming Handler
9.2.21.1.1.1 Detailing the Functionality
- Data Types
- Metadata Support
- Operation Modes
- Operation Processing Support
- Proxy Settings
- Mapping to Google Datasets
A dataset is contained within a specific Google cloud project. Datasets are top-level containers that are used to organize and control access to your tables and views.
Parent topic: BigQuery Streaming Handler with Legacy Streaming API
9.2.21.1.1.1.1 Data Types
The BigQuery Handler supports the standard SQL data types and most of these data types are supported by the BigQuery Handler. A data type conversion from the column value in the trail file to the corresponding Java type representing the BigQuery column type in the BigQuery Handler is required.
The following data types are supported:
BIGNUMERIC
BOOLEAN
BYTES
DATE
DATETIME
FLOAT
INTEGER
JSON
NUMERIC
STRING
TIME
TIMESTAMP
The BigQuery Handler does not support complex data types, such as ARRAY
and STRUCT
.
Parent topic: Detailing the Functionality
9.2.21.1.1.1.2 Metadata Support
The BigQuery Handler creates tables in BigQuery if the tables do not exist.
The BigQuery Handler alters tables to add columns which exist in the source metadata or configured metacolumns which do not exist in the target metadata. The BigQuery Handler also adds columns dynamically at runtime if it detects a metadata change.
The BigQuery Handler does not drop columns in the BigQuery table which do not exist into the source table definition. BigQuery neither supports dropping existing columns, nor supports changing the data type of existing columns. Once a column is created in BigQuery, it is immutable.
Truncate operations are not supported.
Parent topic: Detailing the Functionality
9.2.21.1.1.1.3 Operation Modes
You can configure the BigQuery Handler in one of these two modes:
-
Audit Log Mode = true
gg.handler.name.auditLogMode=true
When the handler is configured to run with audit log mode true, the data is pushed into Google BigQuery without a unique row identification key. As a result, Google BigQuery is not able to merge different operations on the same row. For example, a source row with an insert operation, two update operations, and then a delete operation would show up in BigQuery as four rows, one for each operation.
Also, the order in which the audit log is displayed in the BigQuery data set is not deterministic.
To overcome these limitations, users should specify optype and postion in the meta columns template for the handler. This adds two columns of the same names in the schema for the table in Google BigQuery. For example:
gg.handler.bigquery.metaColumnsTemplate = ${optype}, ${position}
The
optype
is important to determine the operation type for the row in the audit log.To view the audit log in order of the operations processed in the trail file, specify position which can be used in the ORDER BY clause while querying the table in Google BigQuery. For example:
SELECT * FROM [projectId:datasetId.tableId] ORDER BY position
-
auditLogMode = false
-
gg.handler.name.auditLogMode=false
When the handler is configured to run with audit log mode
false
, the data is pushed into Google BigQuery using a unique row identification key. The Google BigQuery is able to merge different operations for the same row. However, the behavior is complex. The Google BigQuery maintains a finite deduplication period in which it will merge operations for a given row. Therefore, the results can be somewhat non-deterministic.The trail source needs to have a full image of the records in order to merge correctly.
Example 1
An insert operation is sent to BigQuery and before the deduplication period expires, an update operation for the same row is sent to BigQuery. The resultant is a single row in BigQuery for the update operation.
Example 2
An insert operation is sent to BigQuery and after the deduplication period expires, an update operation for the same row is sent to BigQuery. The resultant is that both the insert and the update operations show up in BigQuery.
This behavior has confounded many users, as this is the documented behavior when using the BigQuery SDK and a feature as opposed to a defect. The documented length of the deduplication period is at least one minute. However, Oracle testing has shown that the period is significantly longer. Therefore, unless users can guarantee that all operations for a give row occur within a very short period, it is likely there will be multiple entries for a given row in BigQuery. It is therefore just as important for users to configure meta columns with the optype and position so they can determine the latest state for a given row. To read more about audit log mode read the following Google BigQuery documentation:Streaming data into BigQuery.
Parent topic: Detailing the Functionality
9.2.21.1.1.1.4 Operation Processing Support
The BigQuery Handler pushes operations to Google BigQuery using synchronous API. Insert, update, and delete operations are processed differently in BigQuery than in a traditional RDBMS.
The following explains how insert, update, and delete operations are interpreted by the handler depending on the mode of operation:
-
auditLogMode = true
-
-
insert
– Inserts the record withoptype
as an insert operation in the BigQuery table. -
update
–Inserts the record withoptype
as an update operation in the BigQuery table. -
delete
– Inserts the record withoptype
as a delete operation in the BigQuery table. -
pkUpdate
—WhenpkUpdateHandling
property is configured asdelete-insert
, the handler sends out a delete operation followed by an insert operation. Both these rows have the same position in the BigQuery table, which helps to identify it as a primary key operation and not a separate delete and insert operation.
-
-
auditLogMode = false
-
-
insert
– If the row does not already exist in Google BigQuery, then an insert operation is processed as aninsert
. If the row already exists in Google BigQuery, then an insert operation is processed as anupdate
. The handler sets the deleted column tofalse
. -
update
– If a row does not exist in Google BigQuery, then an update operation is processed as aninsert
. If the row already exists in Google BigQuery, then an update operation is processed asupdate
. The handler sets the deleted column tofalse
. -
delete
– If the row does not exist in Google BigQuery, then a delete operation is added. If the row exists in Google BigQuery, then a delete operation is processed as adelete
. The handler sets the deleted column totrue
. -
pkUpdate
—WhenpkUpdateHandling
property is configured asdelete-insert
, the handler sets the deleted column totrue
for the row whose primary key is updated. It is followed by a separate insert operation with the new primary key and the deleted column set tofalse
for this row.
-
Do not toggle the audit log mode because it forces the BigQuery handler to abend as Google BigQuery cannot alter schema of an existing table. The existing table needs to be deleted before switching audit log modes.
Note:
The BigQuery Handler does not support the truncate
operation. It abends when it encounters a truncate
operation.
Parent topic: Detailing the Functionality
9.2.21.1.1.1.5 Proxy Settings
To connect to BigQuery using a proxy server, you must configure the proxy host and the proxy port in the properties file as follows:
jvm.bootoptions= -Dhttps.proxyHost=proxy_host_name -Dhttps.proxyPort=proxy_port_number
Parent topic: Detailing the Functionality
9.2.21.1.1.1.6 Mapping to Google Datasets
A dataset is contained within a specific Google cloud project. Datasets are top-level containers that are used to organize and control access to your tables and views.
A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.
The BigQuery handler can use existing datasets or create datasets if not found.
The BigQuery handler maps the table's schema name to the dataset name. For three-part table names, the dataset is constructed by concatenating catalog and schema.
Parent topic: Detailing the Functionality
9.2.21.1.1.2 Setting Up and Running the BigQuery Handler
The Google BigQuery Handler uses the Java BigQuery client libraries to connect to Big Query.
- Group ID:
com.google.cloud
- Artifact ID: google-cloud-bigquery
- Version: 2.7.1
The BigQuery Client libraries do not ship with Oracle GoldenGate for Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA). Additionally, Google appears to have removed the link to download the BigQuery Client libraries. You can download the BigQuery Client libraries using Maven and the Maven coordinates listed above. However, this requires proficiency with Maven. The Google BigQuery client libraries can be downloaded using the Dependency downloading scripts. For more information, see Google BigQuery Dependencies.
For more information about Dependency Downloader, see Dependency Downloader.
- Schema Mapping for BigQuery
- Understanding the BigQuery Handler Configuration
- Review a Sample Configuration
- Configuring Handler Authentication
Parent topic: BigQuery Streaming Handler with Legacy Streaming API
9.2.21.1.1.2.1 Schema Mapping for BigQuery
The table schema name specified in the replicat map statement is mapped to the BigQuery
dataset name. For example: map QASOURCE.*, target "dataset_US".*;
This map statement replicates tables to the BigQuery dataset
"dataset_US"
. Oracle GoldenGate for Distributed Applications and
Analytics (GG for DAA) normalizes schema and table names to uppercase. Lowercase and mixed
case dataset and table names are supported, but need to be quoted in the Replicat mapping
statement.
Parent topic: Setting Up and Running the BigQuery Handler
9.2.21.1.1.2.2 Understanding the BigQuery Handler Configuration
The following are the configurable values for the BigQuery Handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the BigQuery Handler, you must first configure the
handler type by specifying gg.handler.name.type=bigquery
and
the other BigQuery properties as follows:
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
Any string |
None |
Provides a name for the BigQuery Handler. The BigQuery Handler name then becomes part of the property names listed in this table. |
|
Required |
|
None |
Selects the BigQuery Handler for streaming change data capture into Google BigQuery. |
gg.handler.name.createDataset |
Optional | true | false |
true |
Set to true to automatically create the BigQuery
dataset if it does not exist.
|
|
Optional |
Relative or absolute path to the credentials file |
None |
The credentials file downloaded from Google BigQuery for authentication. If you do not specify the path to the credentials file, you need to set it as an environment variable, see Configuring Handler Authentication. |
|
Required |
Any string |
None |
The name of the project in Google BigQuery. The handler needs project ID to connect to Google BigQuery store. |
|
Optional |
Any number |
|
The maximum number of operations to be batched together. This is applicable for all target table batches. |
|
Optional |
Any number |
|
The maximum amount of time in milliseconds to wait before executing the next batch of operations. This is applicable for all target table batches. |
|
Optional |
|
|
Sets whether to insert all valid rows of a request, even if invalid rows exist. If not set, the entire insert request fails if it contains an invalid row. |
gg.handler. name. ignoreUnknownValues |
Optional |
|
|
Sets whether to accept rows that contain values that do not match the schema. If not set, rows with unknown values are considered to be invalid. |
gg.handler. name. connectionTimeout |
Optional |
Positive integer |
|
The maximum amount of time, in milliseconds, to wait for the handler to establish a connection with Google BigQuery. |
gg.handler. name. readTimeout |
Optional |
Positive integer |
|
The maximum amount of time in milliseconds to wait for the handler to read data from an established connection. |
|
Optional |
A legal string |
None |
A legal string specifying the |
gg.handler. name. auditLogMode |
Optional |
|
|
Set to Set to |
gg.handler. name. pkUpdateHandling |
Optional |
|
|
Sets how the handler handles update operations that change a primary key. Primary key operations can be problematic for the BigQuery Handler and require special consideration:
|
gg.handler.name.adjustScale |
Optional |
|
false |
The BigQuery numeric data type supports a maximum scale of 9 digits.
If a field is mapped into a BigQuery numeric data type, then it fails if the scale is
larger than 9 digits. Set this property to true to round fields
mapped to BigQuery numeric data types to a scale of 9 digits. Enabling this property
results in a loss of precision for source data values with a scale larger than
9.
|
gg.handler.name.includeDeletedColumn |
Optional |
|
false |
Set to true to include a boolean column in the
output called deleted. The value of this column is set to false for
insert and update operations, and is set to true for delete
operations.
|
gg.handler.name.enableAlter |
Optional | true | false |
false |
Set to true to enable altering the target BigQuery
table. This will allow the BigQuery Handler to add columns or metacolumns configured
on the source, which are not currently in the target BigQuery table.
|
gg.handler.name.clientId |
Optional | String | None | Use to set the client id if the configuration property
gg.handler.name.credentialsFile to resolve the Google BigQuery
credentials is not set. You may wish to use this property instead of the credentials
file in order to use Oracle Wallet to secure credentials.
|
gg.handler.name.clientEmail |
Optional | String | None | Use to set the client email if the configuration property
gg.handler.name.credentialsFile to resolve the Google BigQuery
credentials is not set. You may wish to use this property instead of the credentials
file inorder to use Oracle Wallet to secure credentials.
|
gg.handler.name.privateKey |
Optional | String | None | Use to set the private key if the configuration property
gg.handler.name.credentialsFile to resolve the Google BigQuery
credentials is not set. You may wish to use this property instead of the credentials
file inorder to use Oracle Wallet to secure credentials.
|
gg.handler.name.privateKeyId |
Optional | String | None | Use to set the private key id if the configuration property
gg.handler.name.credentialsFile to resolve the Google BigQuery
credentials is not set. You may wish use this property instead of the credentials file
in order to use Oracle Wallet to secure credentials.
|
gg.handler.name.url |
Optional | A legal URL to connect to BigQuery including scheme, server name and port (if not the default port). The default is https://www.googleapis.com. | https://www.googleapis.com | Allows the user to set a URL for a private endpoint to connect to BigQuery. |
To be able to connect GCS to the Google Cloud Service account, ensure that either of the following is configured: the credentials file property with the relative or absolute path to credentials JSON file or the properties for individual credentials keys. The configuration property that is used to individually add google service account credential key enables them to be encrypted using the Oracle wallet.
Parent topic: Setting Up and Running the BigQuery Handler
9.2.21.1.1.2.3 Review a Sample Configuration
The following is a sample configuration for the BigQuery Handler:
gg.handlerlist = bigquery #The handler properties gg.handler.bigquery.type = bigquery gg.handler.bigquery.projectId = festive-athlete-201315 gg.handler.bigquery.credentialsFile = credentials.json gg.handler.bigquery.auditLogMode = true gg.handler.bigquery.pkUpdateHandling = delete-insert gg.handler.bigquery.metaColumnsTemplate =${optype}, ${position}
Parent topic: Setting Up and Running the BigQuery Handler
9.2.21.1.1.2.4 Configuring Handler Authentication
You have to configure the BigQuery Handler authentication using the credentials in the JSON file downloaded from Google BigQuery.
Download the credentials file:
-
Login into your Google account at cloud.google.com.
-
Click Console, and then to go to the Dashboard where you can select your project.
-
From the navigation menu, click APIs & Services then select Credentials.
-
From the Create Credentials menu, choose Service account key.
-
Choose the JSON key type to download the JSON credentials file for your system.
After you have the credentials file, you can authenticate the handler in one of the following methods listed here:
-
Specify the path to the credentials file in the properties file with the
gg.handler.name.credentialsFile
configuration property.The path of the credentials file must contain the path with no wildcard appended. If you include the
*
wildcard in the path to the credentials file, the file is not recognized.Or
- Set the credentials file keys (
clientId
,ClientEmail
,privateKeyId
, andprivateKey
) into the corresponding handler properties.Or
-
Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable on your system. For example:export GOOGLE_APPLICATION_CREDENTIALS = credentials.json
Then restart the Oracle GoldenGate manager process.
Parent topic: Setting Up and Running the BigQuery Handler
9.2.21.1.1.3 Google BigQuery Dependencies
The Google BigQuery client libraries are required for integration with BigQuery.
The maven coordinates are as follows:
Maven groupId: com.google.cloud
Maven artifactId: google-cloud-bigquery
Version: 2.7.1
9.2.21.1.1.3.1 BigQuery 2.7.1
The required BigQuery Client libraries for the 2.7.1 version are as follows:
api-common-2.1.3.jar checker-compat-qual-2.5.5.jar checker-qual-3.21.1.jar commons-codec-1.15.jar commons-logging-1.2.jar error_prone_annotations-2.11.0.jar failureaccess-1.0.1.jar gax-2.11.0.jar gax-httpjson-0.96.0.jar google-api-client-1.33.1.jar google-api-services-bigquery-v2-rev20211129-1.32.1.jar google-auth-library-credentials-1.4.0.jar google-auth-library-oauth2-http-1.4.0.jar google-cloud-bigquery-2.7.1.jar google-cloud-core-2.4.0.jar google-cloud-core-http-2.4.0.jar google-http-client-1.41.2.jar google-http-client-apache-v2-1.41.2.jar google-http-client-appengine-1.41.2.jar google-http-client-gson-1.41.2.jar google-http-client-jackson2-1.41.2.jar google-oauth-client-1.33.0.jar grpc-context-1.44.0.jar gson-2.8.9.jar guava-31.0.1-jre.jar httpclient-4.5.13.jar httpcore-4.4.15.jar j2objc-annotations-1.3.jar jackson-core-2.13.1.jar javax.annotation-api-1.3.2.jar jsr305-3.0.2.jar listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar opencensus-api-0.31.0.jar opencensus-contrib-http-util-0.31.0.jar protobuf-java-3.19.3.jar protobuf-java-util-3.19.3.jar proto-google-common-protos-2.7.2.jar proto-google-iam-v1-1.2.1.jar
Parent topic: Google BigQuery Dependencies
9.2.21.1.2 BigQuery Streaming API with Storage Write API
Overview
BigQuery is a fully managed, AI-ready data platform, hosted on Google Cloud Platform (GCP). This handler uses recommended Storage Write API for low latency streaming of insert records without using any staging.
- Detailed Functionality
- Database User Privileges
- Prerequisites
- Configuration
- Classpath Configuration
- Proxy Configuration
- Sample Configuration
- Troubleshooting and Diagnostics
Parent topic: BigQuery Streaming Handler
9.2.21.1.2.1 Detailed Functionality
The Storage Write API of Google BigQuery has many advantages over legacy API and it is the recommended technology for streaming insert only records to target without any staging at low latency. It also ensures exactly once apply ensuring no duplicates or missing records. For more information, see https://cloud.google.com/java/docs/reference/google-cloud-bigquerystorage/latest/overview.
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.1.2.2 Database User Privileges
The service account used for replicating insert records into Big Query must be granted
either of IAM roles bigquery.dataEditor
,
bigquery.dataOwner
, or bigquery.admin
for
inserting records to target and automatic creation of table. For more information, see
https://cloud.google.com/bigquery/docs/access-control
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.1.2.3 Prerequisites
- Oracle GoldenGate trails must be configured to generate
INSERT
operations only. - Oracle GoldenGate must be configured with auto-creation of table enabled if the target table doesn’t exist on target.
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.1.2.4 Configuration
The configuration of BigQuery replication properties is stored in Replicat properties file.
Note:
Ensure to specify the path to the properties file in the parameter file only when using Coordinated Replicat. Add the following line to the parameter file:TARGETDB LIBFILE libggjava.so SET property=<parameter file
directory>/<properties file name>
The following are configuration properties available for the BigQuery Streaming handler, the required ones must be changed to match your BigQuery Streaming configuration:
Property Name | Required | Property Value | Default | Description |
---|---|---|---|---|
gg.handler.handler-name.type |
Yes | [bigquerystreaming ]
|
N.A. | Defines use of Big Query Streaming replicat |
gg.handler.handler-name.url |
No | [private endpoint URL] | [public endpoint URL is added by default] | Defines the endpoint URL. For private endpoint, URL must be configured. |
gg.handler.handler-name.credentialsFile |
No | [Credentials JSON file path for GCP service account] | N.A. | Path for GCP credentials JSON file for service account for securely connecting to GCP. |
gg.handler.handler-name.clientId |
No | [Valid client Id] | N.A. | Individual credentials parameter for GCP service account. |
gg.handler.handler-name.clientEmail |
No | Valid service account email [<service-account-name>@<projectId>.iam.gserviceaccount.com] | N.A. | Individual credentials parameter for GCP service account. |
gg.handler.handler-name.privateKeyId |
No | [Valid service account privateKeyId] | N.A. | Individual credentials parameter for GCP service account. |
gg.handler.handler-name.privateKey |
No | [Valid service account private key] | N.A. | Individual credentials parameter for GCP service account. |
gg.handler.handler-name.projectId | No | [Valid BigQuery Project Id] | N.A. | Big Query Project Id used for replication. |
gg.handler.handler-name.maxRetryDelayDuration | No | [duration in sec.] | 60 | Maximum duration for stream writer to retry data write (append). |
gg.handler.handler-name.maxRetryAttempt | No | Number of attempts | 5 | Maximum number for retry attempted by stream writer before failing to write. |
gg.handler.handler-anme.maxBatchCount |
No | [number of records] | [5000] | Max record that can be buffered before flushing. |
gg.handler.handler-name.createTable |
No | [true/false] | True | Auto-creation of table and dataset on target is enabled or not. This will be auto configured to enabled. |
gg.handler.handler-name.createDataset |
No | [true/false] | True | Auto-creation of dataset enabling. |
gg.handler.handler-name.metaColumnsTemplate |
No | [valid template string] | N.A. | Defines meta column template for the table. |
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.1.2.5 Classpath Configuration
Google BigQuery Streaming Handler uses the Java SDK for Big Query and Storage API.
Ensure that the gg.classpath
configuration parameter includes the path
to the ingest SDK.
9.2.21.1.2.5.1 Maven Coordinates
The BigQuery Streaming relies on BigQuery Storage API Java SDK, the SDK jar needs to be downloaded to the location provided in the class path.
The maven coordinates for the dependency jars of SDK are:
<dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-bigquerystorage</artifactId> <version>3.9.2</version> </dependency> <dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-bigquery</artifactId> <version>2.43.0</version> </dependency>
Parent topic: Classpath Configuration
9.2.21.1.2.5.2 Dependencies
You can download using the Dependency Downloader script to download the dependencies by running the following script:
<OGGDIR>/DependencyDownloader/bigquerystreaming.sh
For more information about Dependency Downloader, see Dependency Downloader.
Parent topic: Classpath Configuration
9.2.21.1.2.6 Proxy Configuration
When the Replicat process runs behind a proxy server, the property
jvm.bootoptions
can be used to set proxy server configuration.
For example: jvm.bootoptions=-Dhttps.proxyHost=<some-proxy-address.com>
-Dhttps.proxyPort=80
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.1.2.7 Sample Configuration
The sample properties file can also be found in the directory
<OGGDIR>/AdapterExamples/big-data/bigquery_streaming/
.
# Note: Recommended to only edit the configuration marked as TODO gg.handlerlist=bqs gg.handler.bqs.type=bigquerystreaming #TODO: Set the credentials file with service account info for connection gg.handler.bqs.credentialsFile=/path/to/creds.json #TODO: Set the classpath to include Big Query storage SDK gg.classpath=./bigquery-streaming-dep/*
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.1.2.8 Troubleshooting and Diagnostics
- Connectivity issues to BigQuery:
- Validate service account configuration parameters:
project_id
,client_id
,client_email, private_key,
orprivate_key_id
etc. - Check HTTP(S) proxy configuration if running Replicat process behind a proxy.
- Validate service account configuration parameters:
- DDL not applied on the target table: Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) does not support DDL replication.
- Stream Errors: In case there are any errors while ingestingrecords with
Stream,
the Stream info along with the associated table name is logged into the GoldenGate for Big Data handler log file.
Parent topic: BigQuery Streaming API with Storage Write API
9.2.21.2 Google BigQuery Stage and Merge
Topics:
- Overview
BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. - Detailed Functionality
- Prerequisites
- Differences between BigQuery Handler and Stage and Merge BigQuery Event Handler
- Authentication or Authorization
- Operation Aggregation
- Compressed Update Handling
- Configuration
- Troubleshooting and Diagnostics
Parent topic: Google BigQuery
9.2.21.2.1 Overview
BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time.
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.2 Detailed Functionality
The BigQuery Event handler uses the stage and merge data flow.
The change data is staged in a temporary location in microbatches and eventually merged into to the target table. Google Cloud Storage (GCS) is used as the staging area for change data.
This Event handler is used as a downstream Event handler connected to the output of the GCS Event handler.
The GCS Event handler loads files generated by the File Writer Handler into Google Cloud Storage.
The Event handler runs BigQuery Query jobs to execute MERGE
SQL
. The SQL operations are performed in batches providing better
throughput.
Note:
The BigQuery Event handler doesn't use the Google BigQuery streaming API.
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.3 Prerequisites
Google Cloud Storage (GCS) bucket and dataset location: Ensure that the GCS bucket and the BigQuery dataset exist in the same location/region.
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.4 Differences between BigQuery Handler and Stage and Merge BigQuery Event Handler
Table 9-25 BigQuery Handler v/s Stage and Merge BigQuery Event Handler
Feature/Limitation | BigQuery Handler | Stage And Merge BigQuery Event Handler |
---|---|---|
Compressed update support | Partially supported with limitations. | YES |
Audit log mode | Process all the operations as INSERT .
|
No need to enable audit log mode. |
GCP Quotas/Limits | Maximum rows per second per table:
100000 . See Google BigQuery Documentation.
|
Daily destination table update limit —
1500 updates per table per day. See Google BigQuery Documentation.
|
Approximate pricing with 1TB Storage (for exact
pricing refer GCP Pricing calculator)
|
Streaming Inserts for 1TB costs ~72.71 USD per
month
|
Query job for 1TB costs ~20.28 USD
per month.
|
Duplicate rows replicated to BigQuery | YES | NO |
Replication of TRUNCATE
operation
|
Not supported | Supported |
API used | BigQuery Streaming API | BigQuery Query job |
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.5 Authentication or Authorization
For more information about using the Google service account key, see Authentication and Authorization in the Google Cloud Service (GCS) Event Handler topic. In addition to
the permissions needed to access GCS, the service account also needs permissions to
access BigQuery. You may choose to use a pre-defined IAM role, such as
roles/bigquery.dataEditor
or
roles/bigquery.dataOwner
. When creating a custom role, the
following are the IAM permissions used to run BigQuery Event handler. For more
information, see Configuring Handler Authentication.
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.5.1 BigQuery Permissions
Table 9-26 BigQuery Permissions
Permission | Description |
---|---|
bigquery.connections.create |
Create new connections in a project. |
bigquery.connections.delete |
Delete a connection. |
bigquery.connections.get |
Gets connection metadata. Credentials are excluded. |
bigquery.connections.list |
List connections in a project. |
bigquery.connections.update |
Update a connection and its credentials. |
bigquery.connections.use |
Use a connection configuration to connect to a remote data source. |
bigquery.datasets.create |
Create new datasets. |
bigquery.datasets.get |
Get metadata about a dataset. |
bigquery.connections.export |
Export table data out of BigQuery. |
bigquery.connections.get |
Get table metadata. To get table data, you need
bigquery.tables.getData .
|
bigquery.connections.list |
List connections in a project. |
bigquery.connections.update |
Update a connection and its credentials. |
bigquery.datasets.create |
Create new empty datasets. |
bigquery.datasets.get |
Get metadata about a dataset. |
bigquery.datasets.getIamPolicy |
Reserved for future use. |
bigquery.datasets.update |
Update metadata for a dataset. |
bigquery.datasets.updateTag |
Update tags for a dataset. |
bigquery.jobs.create |
Run jobs (including queries) within the project. |
bigquery.jobs.get |
Get data and metadata on any job. |
bigquery.jobs.list |
List all jobs and retrieve metadata on any job submitted by any user. For jobs submitted by other users, details and metadata are redacted. |
bigquery.jobs.listAll |
List all jobs and retrieve metadata on any job submitted by any user. |
bigquery.jobs.update |
Cancel any job. |
bigquery.readsessions.create |
Create a new read session via the BigQuery Storage API. |
bigquery.readsessions.getData |
Read data from a read session via the BigQuery Storage API. |
bigquery.readsessions.update |
Update a read session via the BigQuery Storage API. |
bigquery.reservations.create |
Create a reservation in a project. |
bigquery.reservations.delete |
Delete a reservation. |
bigquery.reservations.get |
Retrieve details about a reservation. |
bigquery.reservations.list |
List all reservations in a project. |
bigquery.reservations.update |
Update a reservation’s properties. |
bigquery.reservationAssignments.create |
Create a reservation assignment. This permission is
required on the owner project and assignee resource. To move a
reservation assignment, you need
bigquery.reservationAssignments.create on the new
owner project and assignee resource.
|
bigquery.reservationAssignments.delete |
Delete a reservation assignment. This permission is
required on the owner project and assignee resource. To move a
reservation assignment, you need
bigquery.reservationAssignments.delete on the old
owner project and assignee resource.
|
bigquery.reservationAssignments.list |
List all reservation assignments in a project. |
bigquery.reservationAssignments.search |
Search for a reservation assignment for a given project, folder, or organization. |
bigquery.routines.create |
Create new routines (functions and stored procedures). |
bigquery.routines.delete |
Delete routines. |
bigquery.routines.list |
List routines and metadata on routines. |
bigquery.routines.update |
Update routine definitions and metadata. |
bigquery.savedqueries.create |
Create saved queries. |
bigquery.savedqueries.delete |
Delete saved queries. |
bigquery.savedqueries.get |
Get metadata on saved queries. |
bigquery.savedqueries.list |
Lists saved queries. |
bigquery.savedqueries.update |
Updates saved queries. |
bigquery.tables.create |
Create new tables. |
bigquery.tables.delete |
Delete tables |
bigquery.tables.export |
Export table data out of BigQuery. |
bigquery.tables.get |
Get table metadata. To get table data, you need
bigquery.tables.getData .
|
bigquery.tables.getData |
Get table data. This permission is required for querying
table data. To get table metadata, you need
bigquery.tables.get .
|
bigquery.tables.getIamPolicy |
Read a table’s IAM policy. |
bigquery.tables.list |
List tables and metadata on tables. |
bigquery.tables.setCategory |
Set policy tags in table schema. |
bigquery.tables.setIamPolicy |
Changes a table’s IAM policy. |
bigquery.tables.update |
Update table metadata. To update table data, you need
bigquery.tables.updateData .
|
bigquery.tables.updateData |
Update table data. To update table metadata, you need
bigquery.tables.update .
|
bigquery.tables.updateTag |
Update tags for a table. |
In addition to these permissions, ensure that
resourcemanager.projects.get/list
is always granted as a pair.
Parent topic: Authentication or Authorization
9.2.21.2.6 Operation Aggregation
Operation aggregation is the process of aggregating (merging/compressing) multiple operations on the same row into a single output operation based on a threshold.
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.6.1 In-Memory Operation Aggregation
- Operation records are aggregated in-memory by default.
- The
gg.aggregate.operations.flush.interval
property has been deprecated and is no longer supported. Ifgg.aggregate.operations.flush.interval
is used in GG for DAA 23ai, then replicat will run; but add a warning to log file about the property being deprecated and not supported.To control the time window for aggregation, use the
gg.handler.bq.fileRollInterval
property. By default, it is set to 3 minutes. Longer intervals will increase latency, and may increase memory usage. Shorter intervals will increase overhead in Oracle GoldenGate and the target database. - Operation aggregation in-memory requires additional JVM memory configuration.
Parent topic: Operation Aggregation
9.2.21.2.6.2 Operation Aggregation Using SQL
- To use SQL aggregation, it is mandatory that the trail files
contain uncompressed
UPDATE
operation records, which means that theUPDATE
operations contain full image of the row being updated. - Operation aggregation using SQL can provide better throughput if the trails files contains uncompressed update records.
- Replicat can aggregate operations using SQL statements by setting the
gg.aggregate.operations.using.sql=true
. - You can tune the frequency of merge interval using the File
writer
gg.handler.bq.fileRollInterval
property, the default value is set to 3m (three minutes). - Operation aggregation using SQL does not require additional JVM memory configuration.
Parent topic: Operation Aggregation
9.2.21.2.7 Compressed Update Handling
A compressed update record contains values for the key columns and the modified columns.
An uncompressed update record contains values for all the columns.
Oracle GoldenGate trails may contain compressed or uncompressed update
records. The default extract configuration writes compressed updates to the trails.
The parameter gg.compressed.update
can be set to
true
or false
to indicate compressed or
uncompressed update records.
The default extract configuration writes compressed updates to the trails. The
parameter gg.compressed.update
can be set to true
or false
to indicate compressed/uncompressed update records.
9.2.21.2.7.1 MERGE Statement with Uncompressed Updates
In some use cases, if the trail contains uncompressed update records, then the
MERGE SQL
statement can be optimized for better performance by
setting gg.compressed.update=false
.
If you want to use DELETE+INSERT SQL
statements instead
of a MERGE SQL
statement, then set
gg.eventhandler.bq.deleteInsert=true
.
Parent topic: Compressed Update Handling
9.2.21.2.8 Configuration
Recommended configuration when using Google BigQuery Stage and Merge Event handler as a coordinated apply replicat
The MERGE
SQL is a mutating DML operation and Google BigQuery runs
up to 2 of them concurrently, after which up to 20 are queued as
PENDING
. Therefore, the recommended configuration for maximum
threads in a coordinated apply replicat is 2. Adding any more threads will not
improve the performance of the handler as they will be run serially rather than
concurrently.
This topic contains the following:
- Automatic Configuration
- Classpath Configuration
The GCS Event handler and the BigQuery Event handler use the Java SDK provided by Google. Google does not provide a direct link to download the SDK. - Proxy Configuration
- INSERTALLRECORDS Support
- BigQuery Dataset and GCP ProjectId Mapping
- BigQuery Iceberg Tables
- End-to-End Configuration
Parent topic: Google BigQuery Stage and Merge
9.2.21.2.8.1 Automatic Configuration
Replication to BigQuery involves configuring of multiple components, such as File Writer handler, Google Cloud Storae (GCS) Event handler and BigQuery Event handler.
The Automatic Configuration functionality helps to auto configure these components so that the user configuration is minimal.
The properties modified by auto configuration is also logged in the handler
log file. To enable auto configuration to replicate to BigQuery target, set the
parameter gg.target=bq
.
When replicating to BigQuery target, you cannot customize GCS Event handler name and BigQuery Event handler name.
- File Writer Handler Configuration
File Writer handler name is preset to the valuebq
. The following is an example to edit a property of File Writer handler:gg.handler.bq.pathMappingTemplate=./dirout
. - GCS Event Handler Configuration
The GCS Event handler name is preset to the valuegcs
. The following is an example to edit a property of GCS Event handler:gg.eventhandler.gcs.concurrency=5
. - BigQuery Event Handler Configuration
BigQuery Event handler name is preset to the valuebq
. There are no mandatory parameters required for BigQuery Event handler. Mostly, auto configure derives the required parameters.
Parent topic: Configuration
9.2.21.2.8.1.1 File Writer Handler Configuration
File Writer handler name is preset to the value bq
. The
following is an example to edit a property of File Writer handler:
gg.handler.bq.pathMappingTemplate=./dirout
.
Parent topic: Automatic Configuration
9.2.21.2.8.1.2 GCS Event Handler Configuration
The GCS Event handler name is preset to the value gcs
. The following
is an example to edit a property of GCS Event handler:
gg.eventhandler.gcs.concurrency=5
.
Parent topic: Automatic Configuration
9.2.21.2.8.1.3 BigQuery Event Handler Configuration
BigQuery Event handler name is preset to the value bq
.
There are no mandatory parameters required for BigQuery Event handler. Mostly, auto
configure derives the required parameters.
The following are the BigQuery Event handler configurations:
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.bq.credentialsFile |
Optional | Relative or absolute path to the service account key file. | Value from property
gg.eventhandler.gcs.credentialsFile |
Sets the path to the service account key file.
Autoconfigure will automatically configure this property based on the
configuration gg.eventhandler.gcs.credentialsFile,
unless the user wants to use a different service account key file for
BigQuery access. Alternatively, if the environment variable
GOOGLE_APPLICATION_CREDENTIALS is set to the path
to the service account key file, this parameter need not be set.
|
gg.eventhandler.bq.projectId |
Optional | The Google project-id | project-id associated with the service account. | Sets the project-id of the Google Cloud project that houses BigQuery. Autoconfigure will automatically configure this property by accessing the service account key file unless user wants to override this explicitly. |
gg.eventhandler.bq.kmsKey |
Optional | Key names in the format:
projects/<PROJECT>/locations/<LOCATION>/keyRings/<RING_NAME>/cryptoKeys/<KEY_NAME>
|
Value from property
gg.eventhandler.gcs.kmsKey |
Set a customer managed Cloud KMS key to encrypt data in
BigQuery. Autoconfigure will automatically configure this property based
on the configuration
gg.eventhandler.gcs.kmsKey .
|
gg.eventhandler.bq.connectionTimeout |
Optional | Positive integer. | 20000 |
The maximum amount of time, in milliseconds, to wait for the handler to establish a connection with Google BigQuery. |
gg.eventhandler.bq.readTimeout |
Optional | Positive integer. | 30000 |
The maximum amount of time in milliseconds to wait for the handler to read data from an established connection. |
gg.eventhandler.bq.totalTimeout |
Optional | Positive integer. | 120000 |
The total timeout parameter in seconds. The TotalTimeout parameter has the ultimate control over how long the logic should keep trying the remote call until it gives up completely. |
gg.eventhandler.bq.retries |
Optional | Positive integer. | 3 |
The maximum number of retry attempts to perform. |
gg.eventhandler.bq.createDataset |
Optional | true | false |
true |
Set to true to automatically create the
BigQuery dataset if it does not exist.
|
gg.eventhandler.bq.createTable |
Optional | true | false |
true |
Set to true to automatically create the
BigQuery target table if it does not exist.
|
gg.handler.bq.fileRollInterval |
Optional | Integer | The default unit of measure is milliseconds. You can stipulate ms, s, m, h to signify milliseconds, seconds, minutes, or hours respectively. Examples of legal values include 10000, 10000ms, 10s, 10m, or 1.5h. Values of 0 or less indicate that file rolling on time is turned off. | The parameter determines how often the data will be
merged into Google BigQuery. Use with caution, the higher this value is
the more data will need to be stored in the memory of the Replicat
process.
Note: Use the parameter with caution. Increasing its default value (3m) will increase the amount of data stored in the internal memory of the Replicat. This can cause out of memory errors and stop the Replicat if it runs out of memory.Note: Starting with the 23ai release, thegg.aggregate.operations.flush.interval property
is deprecated and no longer supported. For more information, see
In-Memory Operation Aggregation |
gg.compressed.update |
Optional | true or false |
true |
If set the true , then this indicates
that the source trail files contain compressed update operations. If set
to true , then the source trail files are expected to
contain uncompressed update operations.
|
gg.eventhandler.bq.connectionRetryIntervalSeconds
|
Optional | Integer Value | 30 | Specifies the delay in seconds between connection retry attempts. |
gg.eventhandler.bq.connectionRetries
|
Optional | Integer Value | 3 | Specifies the number of times connections to the target data warehouse will be retried. |
gg.eventhandler.bq.url |
Optional | An absolute URL to connect to Google BigQuery. | https://googleapis.com | A legal URL to connect to Google BigQuery including scheme, server name and port (if not the default port). The default is https://googleapis.com. |
gg.eventhandler.bq.clientId |
Optional | Client Id param of the service account key file. | Value from property
gg.eventhandler.gcs.clientId |
Sets the clientId of the service account key file.
Auto-configure will automatically configure this property based on the
configuration gg.eventhandler.gcs.clientId , unless the
user wants to use a different service account key file for BigQuery
access. This enables service account key for encryption using Oracle
wallet.
|
gg.eventhandler.bq.clientEmail |
Optional | Client Email of the service account key file. | Value from property
gg.eventhandler.gcs.clientEmail |
Sets the client Email of the service account key file.
Auto configure will automatically configure this property based on the
configuration q , unless the user wants to use a
different service account key file for BigQuery access. This enables
service account key for encryption using Oracle wallet.
|
gg.eventhandler.bq.privateKeyId |
Optional | Private Key ID of the service account key file. | Value from property
gg.eventhandler.gcs.privateKeyId |
Sets the private key Id of the service account key file.
Auto-configure will automatically configure this property based on the
configuration gg.eventhandler.gcs.privateKeyId , unless
the user wants to use a different service account key file for BigQuery
access. This enables service account key for encryption using Oracle
wallet.
|
gg.eventhandler.bq.privateKey |
Optional | Private Key of the service account key file. | Value from property
gg.eventhandler.gcs.privateKey |
Sets the private key of the service account key file.
Auto configure will automatically configure this property based on the
configuration gg.eventhandler.gcs.privateKey , unless
the user wants to use a different service account key file for BigQuery
access. This enables service account key for encryption using Oracle
wallet.
|
gg.validate.keyupdate |
Optional | true or false |
false |
If set to true , Replicat will validate
key update operations (optype 115) and correct to normal update if no
key values have changed. Compressed key update operations do not qualify
for merge. |
|
gg.eventhandler.bq.deleteInsert |
Optional | true or false |
false |
If set to true , Replicat will merge
records using SQL DELETE+INSERT statements instead of
SQL MERGE statement. Applicable only if
gg.compressed.update is set to
false .
|
gg.eventhandler.bq.tableType |
Optional | native or iceberg .
For more information, see BigQuery Iceberg Tables.
|
native |
Indicates BigQuery table format for automatic table
creation. Options are native (default): automatically
create native BigQuery tables, and iceberg :
automatically create Iceberg tables. For more information, see BigQuery Iceberg Tables.
|
gg.eventhandler.bq.storageUri |
Required | BigQuery Iceberg table’s Cloud Storage URI. | None | Required when the property tableType is
set to iceberg . A fully qualified Cloud Storage URI for
Iceberg tables. For more information, see BigQuery Iceberg Tables.
|
gg.eventhandler.bq.storageConnectionId |
Required | Storage Connection ID to access the Bigquery Iceberg table’s Cloud Storage URI. | None | Required when the property tableType is set to iceberg. Storage Connection ID to access the Cloud Storage URI for Iceberg tables. For more information, see BigQuery Iceberg Tables. |
Parent topic: Automatic Configuration
9.2.21.2.8.2 Classpath Configuration
The GCS Event handler and the BigQuery Event handler use the Java SDK provided by Google. Google does not provide a direct link to download the SDK.
You can download the SDKs using the following maven co-ordinates:
<dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-storage</artifactId> <version>1.113.9</version> </dependency>
To download the GCS dependencies, execute the following script
<OGGDIR>/DependencyDownloader/gcs.sh
.
<dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-bigquery</artifactId> <version>1.111.1</version> </dependency>
To download the BigQuery dependencies, execute the following script
<OGGDIR>/DependencyDownloader/bigquery.sh
. For more information, see
gcs.sh
in Dependency Downloader Scripts.
Set the path to the GCS and BigQuery SDK in the
gg.classpath
configuration parameter. For example:
gg.classpath=./gcs-deps/*:./bq-deps/*
.
For more information, see Dependency Downloader Scripts.
Parent topic: Configuration
9.2.21.2.8.3 Proxy Configuration
When the replicat process is run behind a proxy server, you can use the
jvm.bootoptions
property to set the proxy server configuration.
For
example: jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com
-Dhttps.proxyPort=80
.
Parent topic: Configuration
9.2.21.2.8.4 INSERTALLRECORDS Support
Stage and merge targets supports
INSERTALLRECORDS
parameter.
See INSERTALLRECORDS in
Reference for Oracle GoldenGate. Set the
INSERTALLRECORDS
parameter in the Replicat
parameter file (.prm
). Set the
INSERTALLRECORDS
parameter in the Replicat
parameter file (.prm
)
Setting this property directs the Replicat process to use bulk insert operations to load operation data into the target table.
To process initial load trail files, set theINSERTALLRECORDS
parameter in the Replicat parameter file (.prm
). Setting
this property directs the Replicat process to use bulk insert operations to
load operation data into the target table. You can tune the batch size of
bulk inserts using the gg.handler.bq.maxFileSize
File
Writer property. The default value is set to 1GB.
The
frequency of bulk inserts can be tuned using the File Writer
gg.handler.bq.fileRollInterval
property,
the default value is set to 3m (three minutes).
Parent topic: Configuration
9.2.21.2.8.5 BigQuery Dataset and GCP ProjectId Mapping
The BigQuery Event handler maps the table schema name to the BigQuery dataset.
For three-part table names, the table catalog name is
mapped to the GCP projectId
.
If the tables use distinct catalog names, then the BigQuery datasets can reside in multiple GCP projects. The GCP service account key should have the required privileges to create datasets and tables in the respective
The table catalog name is mapped to the GCP
projectId
.
Parent topic: Configuration
9.2.21.2.8.5.1 Three-Part Table Names
Parent topic: BigQuery Dataset and GCP ProjectId Mapping
9.2.21.2.8.5.2 Mapping Table
Table 9-27 Mapping Table
MAP statement in the Replicat parameter file | BigQuery Dataset | GCP ProjectId |
---|---|---|
MAP SCHEMA1.*, TARGET
"bq-project-1".*.*; |
SCHEMA1 |
bq-project-1 |
MAP "bq-project-2".SCHEMA2.*, TARGET
*.*.*; |
SCHEMA2 |
bq-project-2 |
MAP SCHEMA3.*, TARGET *.*; |
SCHEMA3 |
The default projectId from the GCP service account key file or the
configuration gg.eventhandler.bq.projectId .
|
Parent topic: BigQuery Dataset and GCP ProjectId Mapping
9.2.21.2.8.6 BigQuery Iceberg Tables
This event handler supports automatic creation of Iceberg tables in BigQuery.
For more information, see https://cloud.google.com/bigquery/docs/iceberg-tables#api
Iceberg tables created by BigQuery uses Parquet file format, this cannot be modified.
A Cloud Storage URI and a Cloud Storage Connection are required to create Iceberg tables in BigQuery.
Parent topic: Configuration
9.2.21.2.8.6.1 Cloud Storage URI
The Cloud Storage URI is the location where the Iceberg table data is stored.
Every target Iceberg table requires a unique Cloud Storage URI.
For uniqueness, the value configured in the
gg.eventhandler.bq.storageUri
property is appended with the fully
qualified table name to create the Iceberg table.
For example:
gg.eventhandler.bq.storageUri=gs://<bucket-name>/path/to/iceberg-tables/<project-id>/<dataset-name>/<short-table-name>
Parent topic: BigQuery Iceberg Tables
9.2.21.2.8.6.2 Cloud Storage Connection
Follow the steps here: https://cloud.google.com/bigquery/docs/create-cloud-resource-connection to create a Cloud Storage Connection.
Ensure that the service account associated with the Cloud Storage Connection has the permissions to access the Cloud Storage URI.
The Cloud Storage Connection and the Cloud Storage bucket must reside in the same GCP region.
Note the Connection ID of the Cloud Storage Connection and ensure that the Connection ID must be prepended with the GCP region during configuration.
For example:
gg.eventhandler.bq.storageConnectionId=us-central1.<storage_connection_id>
Parent topic: BigQuery Iceberg Tables
9.2.21.2.8.7 End-to-End Configuration
The following is an end-end configuration example which uses auto configuration for File Writer (FW) handler, GCS, and BigQuery Event handlers.
AdapterExamples/big-data/bigquery-via-gcs/bq.props
. # Configuration to load GoldenGate trail operation records # into Google Big Query by chaining # File writer handler -> GCS Event handler -> BQ Event handler. # Note: Recommended to only edit the configuration marked as TODO # The property gg.eventhandler.gcs.credentialsFile need not be set if # the GOOGLE_APPLICATION_CREDENTIALS environment variable is set. gg.target=bq ## The GCS Event handler #TODO: Edit the GCS bucket name gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name> #TODO: Edit the GCS credentialsFile gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile ## The BQ Event handler ## No mandatory configuration required. #TODO: Edit to include the GCS Java SDK and BQ Java SDK. gg.classpath=/path/to/gcs-deps/*:/path/to/bq-deps/* #TODO: Edit to provide sufficient memory (at least 8GB). jvm.bootoptions=-Xmx8g -Xms8g #TODO: If running OGGBD behind a proxy server. #jvm.bootoptions=-Xmx8g -Xms512m -Dhttps.proxyHost=<ip-address> -Dhttps.proxyPort=<port>
Parent topic: Configuration
9.2.21.2.9 Troubleshooting and Diagnostics
- DDL not applied on the target table: Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) does not support DDL replication.
- SQL Errors: In case there are any errors while executing any SQL, the entire SQL statement along with the bind parameter values are logged into the GG for DAA handler log file.
- Co-existence of the components: The location/region of the
machine where Replicat process is running and the BigQuery dataset/GCS bucket
impacts the overall throughput of the apply process.
Data flow is as follows: GoldenGate -> GCS bucket -> BigQuery. For best throughput, ensure that the components are located as close as possible.
com.google.cloud.bigquery.BigQueryException: Access Denied: Project <any-gcp-project>: User does not have bigquery.datasets.create permission in project <any-gcp-project>
. The service account key used by GG for DAA does not have permission to create datasets in this project. Grant the permissionbigquery.datasets.create
and restart the Replicat process. The privileges are listed in BigQuery Permissions.
Parent topic: Google BigQuery Stage and Merge