8.9 Realtime Data Ingestion into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics
Overview
This Quickstart covers a step-by-step process showing how to ingest real-time data into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).
GCP BigQuery is a cloud-based data warehousing platform that provides a fully managed service for storing, processing, and analysing data.
GG for DAA supports stage and merge and streaming api designs. In stage and merge, the change data is staged in a temporary location in microbatches and eventually merged into to the target table. GCS is used as the staging location.
All replication process is automatically handled by GoldenGate for Distributed Applications and Analytics (GG for DAA) GCP BigQuery Stage and Merge Handler.
- Prerequisites for Google Cloud Platform BigQuery Stage and Merge
- Install Dependency Files
- Create a Replicat in Oracle GoldenGate for Distributed Applications and Analytics
Parent topic: Quickstarts
8.9.1 Prerequisites for Google Cloud Platform BigQuery Stage and Merge
- Google Cloud Platform (GCP) account set up.
- A Google Cloud Platform (GCP) service account key with relevant BigQuery Permissions. Copy your GCP service account key to a directory on your Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) Server.
- A Google Cloud Storage bucket with relevant permissions. Ensure that the GCS bucket and the BigQuery dataset exist in the same location or region.
- Target BigQuery tables can be created before configuring the replicat. If necessary permissions are provided, then GG for DAA can auto create the target BigQuery tables.
In this quick start, we will use a sample trail file (named tr) which is
shipped with GG for DAA. If you want to continue with sample trail file, it is
located at GG_HOME/opt/AdapterExamples/trail/
in your GG for DAA
instance.
GG for DAA will create the tables automatically.
8.9.2 Install Dependency Files
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) uses Java SDK provided by Google. You can download the SDKs using Dependency Downloader utility shipped with GG for DAA. Dependency downloader is a set of shell scripts that downloads dependency jar files from Maven and other repositories.
- In your GG for DAA VM, go to dependency downloader utility. It is
located at
GG_HOME/opt/DependencyDownloader/
. - Run gcs.sh and bigquery.sh with the required versions.
Figure 8-59 Run gcs.sh and bigquery.sh with the required versions
- 2 directories are created in
GG_HOME/opt/DependencyDownloader/dependencies
. Make a note of the directories:/u01/app/ogg/opt/DependencyDownloader/dependencies/bigquery_1.111.1
/u01/app/ogg/opt/DependencyDownloader/dependencies/gcs_1.113.9
8.9.3 Create a Replicat in Oracle GoldenGate for Distributed Applications and Analytics
To create a replicat in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA):
- In the Administration Service tab, click the + sign
to add a replicat.
Figure 8-60 Click + in the Administration Service tab.
- Select the Classic Replicat Replicat Type and click
Next. There are two different Replicat types available: Classic and
Coordinated. Classic Replicat is a single threaded process whereas Coordinated
Replicat is a multithreaded one that applies transactions in parallel.
Figure 8-61 Add Replicat
- Enter the Replicat options, and click Next:
- Trail Name: Name of the required trail file (if using
sample trail, provide as
tr
) - Subdirectory: Provide as
GG_HOME/opt/AdapterExamples/trail/
if using the sample trail. - Target: Google BigQuery
Figure 8-62 Replicat Options
- Trail Name: Name of the required trail file (if using
sample trail, provide as
- Leave Managed Options as is and click Next.
Figure 8-63 Managed Options
- Enter Parameter File details and click Next. In the
Parameter File, you can specify source to target mapping. If you are using the
sample trail file (
tr
), then enter as follows:MAP QASOURCE.*, TARGET <your_schema_name>.*;
If Coordinated Replicat is selected as the Replicat Type, an additional parameter needs to be provided:
TARGETDB LIBFILE libggjava.so SET property=<ggbd-deployment_home>/etc/conf/ogg/your_replicat_name.properties
Figure 8-64 Parameter File
- In the next screen, you need to update the properties only tagged
as TODO and click Create and Run.
Provide your GCS bucket name:
#TODO: Edit the GCS bucket name gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
Provide path to your GCP service account key:#TODO: Edit the GCS credentialsFile gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile
Provide path to dependency jar files that you downloaded in prerequisites:#TODO: Edit to include the GCS Java SDK and BQ Java SDK. gg.classpath=/path/to/gcs_dependencies/*:/path/to/bq_dependencies/*
Figure 8-65 Properties File
For more information about replication configuration, see Google BigQuery Stage and Merge.
- If replicat starts successfully, it will be in running state. You
can go to action/details/statistics to see the replication statistics.
Figure 8-66 Replication Statistics
Figure 8-67 GCPBQ Statistics
- You can go to GCP Big Query console and check the tables. It may take a short
while for tables to be created and loaded.
Figure 8-68 GCPBQ Console
Note:
- You can run an initial load with BigQuery replicat. For more information see BigQuery Handler.