9.2.34 Snowflake
- Snowflake Stage and Merge
- Snowflake Streaming Handler
Table 9-38 Differences between Stage and Merge, and Streaming Handlers
Details | Stage and Merge | Streaming Handler |
---|---|---|
Workload | Any | Inserts-only |
Latency | Micro-batch | Real-time |
Resources | Customer-managed virtual warehouse | Snowflake-managed serverless compute |
API | JDBC+SQL commands
(PUT /COPY INTO /MERGE )
|
Snowpipe Streaming |
Data load | Requires intermediary loading step from Int/Ext Stage. | Direct loading into Snowflake |
Topics:
9.2.34.1 Snowflake Stage and Merge Handler
Topics:
- Overview
- Detailed Functionality
- Configuration
- Snowflake Iceberg Tables
- Troubleshooting and Diagnostics
Parent topic: Snowflake
9.2.34.1.1 Overview
Snowflake is a serverless data warehouse that runs on any of the following cloud providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.
The Snowflake Event Handler is used to replicate data into Snowflake.
Parent topic: Snowflake Stage and Merge Handler
9.2.34.1.2 Detailed Functionality
- The change data from the Oracle GoldenGate trails is staged in micro-batches at a temporary staging location (internal or external stage).
- The staged records are then merged into the Snowflake target tables using a merge SQL statement.
This topic contains the following:
Parent topic: Snowflake Stage and Merge Handler
9.2.34.1.2.1 Staging Location
The change data records from the Oracle GoldenGate trail files are formatted into Avro OCF (Object Container Format) and are then uploaded to the staging location.
Change data can be staged in one of the following object stores:
- Snowflake internal stage
- Snowflake external stage
- AWS Simple Storage Service (S3)
- Azure Data Lake Storage (ADLS) Gen2
- Google Cloud Storage (GCS)
Parent topic: Detailed Functionality
9.2.34.1.2.2 Database User Privileges
The database user used for replicating into Snowflake has to be granted the following privileges:
INSERT
,UPDATE
,DELETE
, andTRUNCATE
on the target tables.CREATE
andDROP
on Snowflake named stage and external stage.- If using external stage (S3, ADLS, GCS),
CREATE
,ALTER
, andDROP
external table.
Parent topic: Detailed Functionality
9.2.34.1.2.3 Prerequisites
- You must have Amazon Web Services, Google Cloud Platform, or Azure cloud accounts set up if you intend to use any of the external stage locations such as, S3, ADLS Gen2, or GCS.
- Snowflake JDBC driver
Parent topic: Detailed Functionality
9.2.34.1.3 Configuration
Note:
Ensure to specify the path to the properties file in the parameter file only when using Coordinated Replicat. Add the following line to the parameter file:TARGETDB LIBFILE libggjava.so SET property=<parameter file directory>/<properties file name>
- Automatic Configuration
- Snowflake Storage Integration
- Classpath Configuration
- Proxy Configuration
- INSERTALLRECORDS Support
- Snowflake Key Pair Authentication
- Mapping Source JSON/XML to Snowflake VARIANT
- Operation Aggregation
- Compressed Update Handling
- End-to-End Configuration
- Table Mapping
Parent topic: Snowflake Stage and Merge Handler
9.2.34.1.3.1 Automatic Configuration
Snowflake replication involves configuring multiple components, such as the File Writer Handler, S3 or HDFS or GCS Event Handler, and the target Snowflake Event Handler.
The Automatic Configuration functionality helps you to auto-configure these components so that the manual configuration is minimal.
The properties modified by auto-configuration is also logged in the handler log file.
To enable auto-configuration to replicate to the Snowflake target, set
the parameter gg.target=snowflake
.
The parameter gg.stage
determines the staging location.
If gg.stage
is unset, then Snowflake internal stage is used.
If gg.stage
is set to one of them -
s3
, abs
, or gcs
, then AWS S3,
ADLS Gen2, or GCS are respectively used as the staging locations.
The JDBC Metadata provider is also automatically enabled to retrieve target table metadata from Snowflake.
- File Writer Handler Configuration
- S3 Handler Configuration
- HDFS Event Handler Configuration
- Google Cloud Storage Event Handler Configuration
- Snowflake Event Handler Configuration
Parent topic: Configuration
9.2.34.1.3.1.1 File Writer Handler Configuration
The File Writer Handler name is pre-set to the value snowflake
and
its properties are automatically set to the required values for Snowflake.
You can add or edit a property of the File Writer Handler. For example:
gg.handler.snowflake.pathMappingTemplate=./dirout
Parent topic: Automatic Configuration
9.2.34.1.3.1.2 S3 Handler Configuration
The S3 Event Handler name is pre-set to the value
s3
and must be configured to match your S3 configuration.
The following is an example of editing a property of the S3 Event Handler:
gg.eventhandler.s3.bucketMappingTemplate=bucket1
For
more information, see Amazon S3.
Parent topic: Automatic Configuration
9.2.34.1.3.1.3 HDFS Event Handler Configuration
The Hadoop Distributed File System (HDFS) Event Handler name is pre-set
to the value hdfs
and it is auto-configured to write to HDFS.
Ensure that the Hadoop configuration file core-site.xml
is configured to write data files to the respective container in the Azure Data Lake
Storage (ADLS) Gen2 storage account. For more information, see Azure Data Lake Gen2 using Hadoop Client and ABFS.
The following is an example of editing a property of the HDFS Event handler:
gg.eventhandler.hdfs.finalizeAction=delete
Parent topic: Automatic Configuration
9.2.34.1.3.1.4 Google Cloud Storage Event Handler Configuration
The Google Cloud Storage (GCS) Event Handler name is pre-set to the
value gcs
and must be configured to match your GCS
configuration.
The following is an example of editing a GCS Event Handler property:
gg.eventhandler.gcs.bucketMappingTemplate=bucket1
Parent topic: Automatic Configuration
9.2.34.1.3.1.5 Snowflake Event Handler Configuration
The Snowflake Event Handler name is pre-set to the value
snowflake
.
The following are configuration properties available for the Snowflake Event handler, the required ones must be changed to match your Snowflake configuration:
Table 9-39 Snowflake Event Handler Configuration
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.snowflake.connectionURL |
Required | Supported connection URL. For example,
jdbc:snowflake://<account_name>.snowflakecomputing.com/?warehouse=<warehouse-name>&db=<database-name> |
None | JDBC URL to connect to Snowflake. Snowflake account
name, warehouse and database must be set in the JDBC URL. The
warehouse can be set using warehouse=<warehouse
name> , database can set using db=<db
name> . In some cases for authorization, a role
should be set using role=<rolename> |
gg.eventhandler.snowflake.UserName |
Required | Supported database user name string. | None | Snowflake database user. |
gg.eventhandler.snowflake.Password |
Required | Supported database password string. | None | Snowflake database password. |
gg.eventhandler.snowflake.storageIntegration |
Optional | Storage integration name. | None | This parameter is required when using an external stage such as ADLS Gen2 or GCS or S3. This is the credential for Snowflake data warehouse to access the respective Object store files. For more information, see Snowflake Storage Integration. |
gg.eventhandler.snowflake.maxConnections |
Optional | Integer value | 10 | Use this parameter to control the number of concurrent JDBC database connections to the target Snowflake database. |
gg.eventhandler.snowflake.dropStagingTablesOnShutdown |
Optional | true | false |
false |
If set to true , the temporary
staging tables created by Oracle GoldenGate are dropped on replicat
graceful stop.
|
gg.handler.snowflake.fileRollInterval |
Optional | The default unit of measure is milliseconds. You can stipulate ms, s, m, h to signify milliseconds, seconds, minutes, or hours respectively. Examples of legal values include 10000, 10000ms, 10s, 10m, or 1.5h. Values of 0 or less indicate that file rolling on time is turned off. | 3m (three minutes) | The parameter determines how often the data will be
merged into Snowflake. Use with caution, the higher this value is
the more data will need to be stored in the memory of the Replicat
process.
Note: Use the parameter with caution. Increasing its default value (3m) will increase the amount of data stored in the internal memory of the Replicat. This can cause out of memory errors and stop the Replicat if it runs out of memory.Note: Starting with the 23ai release, thegg.aggregate.operations.flush.interval
property is deprecated and no longer supported. For more
information, see In-Memory Operation Aggregation |
gg.eventhandler.snowflake.putSQLThreads |
Optional | Integer Value | 4 | Specifies the number of threads
(`PARALLEL ` clause) to use for uploading files
using PUT SQL . This is only relevant when Snowflake
internal stage (named stage) is used.
|
gg.eventhandler.snowflake.putSQLAutoCompress |
Optional | true | false |
false |
Specifies whether Snowflake uses
gzip to compress files
(AUTO_COMPRESS clause) during upload using
PUT SQL .
true : Files are compressed (if they are not already
compressed).
false : Files are not
compressed (which means, the files are uploaded as is). This is only
relevant when Snowflake internal stage (named stage) is used.
|
gg.validate.keyupdate |
Optional | true or
false |
false |
If set to true , Replicat will
validate key update operations (optype 115) and correct to normal
update if no key values have changed. Compressed key update
operations do not qualify for merge.
|
gg.eventhandler.snowflake.useCopyForInitialLoad |
Optional | true or
false |
true |
If set to true , then COPY
SQL statement will be used during initial load. If set
to false , then INSERT SQL
statement will be used during initial load.
|
gg.compressed.update |
Optional | true or
false |
true |
If set the true , then this
indicates that the source trail files contain compressed update
operations. If set to false , then the source trail
files are expected to contain uncompressed update operations.
|
gg.eventhandler.snowflake.connectionRetries
|
Optional | Integer Value | 3 | Specifies the number of times connections to the target data warehouse will be retried. |
gg.eventhandler.snowflake.connectionRetryIntervalSeconds
|
Optional | Integer Value | 30 | Specifies the delay in minutes between connection retry attempts. |
gg.eventhandler.snowflake.deleteInsert |
Optional | true or
false |
false |
If set to true , Replicat will merge
records using SQL DELETE+INSERT statements instead
of SQL MERGE statement.
Note: Applicable only ifgg.compressed.update is set to
false .
|
gg.eventhandler.snowflake.detectMissingBaseRow |
Optional | true or
false |
false |
Diagnostic parameter to find UPDATE
operations without base row. If set to true ,
Replicat will ABEND if there are
UPDATE operations without base row. These rows
will be collected into another table that can be
investigated.
|
gg.eventhandler.snowflake.createTable |
Optional | true or
false |
true |
If the value is set to true , then
target tables are automatically created if missing.
|
gg.eventhandler.snowflake.tableType |
Optional | native or iceberg .
|
native |
Indicates Snowflake table format for automatic table creation.
Options are native (default): automatically create
native Snowflake tables, and iceberg : automatically
create Iceberg tables.
|
gg.eventhandler.snowflake.icebergExternalVolume |
Optional | External volume name | None | Required when the property
tableType is set to iceberg .
An external volume name to connect Snowflake to your external cloud
storage for Iceberg tables.
|
gg.eventhandler.snowflake.icebergBaseLocation |
Optional | Base location path | Empty ("" )
|
Applicable when the property
tableType is set to iceberg .
Base location path specifies a relative path from the Iceberg
table’s external volume location, can be empty. Oracle GoldenGate
will append the qualified table name to the base location path to
create the Iceberg table.
|
Parent topic: Automatic Configuration
9.2.34.1.3.2 Snowflake Storage Integration
When you use an external staging location, ensure to setup Snowflake storage integration to grant Snowflake database read permission to the files located in the cloud object store.
If the configuration property gg.stage
is not set, then
the storage integration is not required, and Oracle GoldenGate will default to
internal stage.
- Azure Data Lake Storage (ADLS) Gen2 Storage Integration: For
more information about creating the storage integration for Azure, see Snowflake documentation to
create the storage integration for Azure.
Example:
-- AS ACCOUNTADMIN create storage integration azure_int type = external_stage storage_provider = azure enabled = true azure_tenant_id = '<azure tenant id>' storage_allowed_locations = ('azure://<azure-account-name>.blob.core.windows.net/<azure-container>/'); desc storage integration azure_int; -- Read AZURE_CONSENT_URL and accept the terms and conditions specified in the link. -- Read AZURE_MULTI_TENANT_APP_NAME to get the Snowflake app name to be granted Blob Read permission. grant create stage on schema <schema name> to role <role name>; grant usage on integration azure_int to role <role name>;
- Google Cloud Storage (GCS) Storage Integration: For more
information about creating the storage integration for GCS, see Snowflake Documentation.
Example:
create storage integration gcs_int type = external_stage storage_provider = gcs enabled = true storage_allowed_locations = ('gcs://<gcs-bucket-name>/'); desc storage integration gcs_int; -- Read the column STORAGE_GCP_SERVICE_ACCOUNT to get the GCP Service Account email for Snowflake. -- Create a GCP role with storage read permission and assign the role to the Snowflake Service account. grant create stage on schema <schema name> to role <role name>; grant usage on integration gcs_int to role <role name>;
- AWS S3 Storage Integration: For more information about
creating the storage integration for S3, see Snowflake
Documentation.
Note:
When you use S3 as the external stage, you don't need to create storage integration if you already have access to the following AWS credentials: AWS Access Key Id and Secret key. You can set AWS credentials in thejvm.bootoptions
property. - The storage integration name must start with an alphabetic character and cannot
contain spaces or special characters unless the entire identifier string is
enclosed in double quotes for example,
My object
. Identifiers enclosed in double quotes are also case-sensitive.
Parent topic: Configuration
9.2.34.1.3.3 Classpath Configuration
Snowflake Event Handler uses the Snowflake JDBC driver. Ensure that the classpath includes the path to the JDBC driver. If an external stage is used, then you need to also include the respective object store Event Handler’s dependencies in the classpath.
Parent topic: Configuration
9.2.34.1.3.3.1 Dependencies
Snowflake JDBC driver: You can use the Dependency Downloader
tool to download the JDBC driver by running the following script:
<OGGDIR>/DependencyDownloader/snowflake.sh
.
See Dependency Downloader for more information.
Alternatively, you can also download the JDBC driver from Maven central using the following Dependency Downloader co-ordinates:
<dependency>
<groupId>net.snowflake</groupId>
<artifactId>snowflake-jdbc</artifactId>
<version>3.13.19</version>
</dependency>
- If staging location is set to S3, then the classpath should include the S3 Event handler dependencies. See S3 Handler Configuration.
- If staging location is set to HDFS, then the classpath should include the HDFS Event handler dependencies. See HDFS Event Handler Configuration.
- If staging location is set to Google Cloud Storage (GCS), then the classpath should include the GCS Event handler dependencies. See Google Cloud Storage Event Handler Configuration.
Edit the gg.classpath
configuration parameter to include
the path to the object store Event Handler dependencies (if external
stage is in use) and the Snowflake JDBC driver.
Parent topic: Classpath Configuration
9.2.34.1.3.4 Proxy Configuration
When the Replicat process runs behind a proxy server, you can use the
jvm.bootoptions
property for proxy server
configuration.
Example:
jvm.bootoptions=-Dhttp.useProxy=true -Dhttps.proxyHost=<some-proxy-address.com> -Dhttps.proxyPort=80 -Dhttp.proxyHost=<some-proxy-address.com> -Dhttp.proxyPort=80
Parent topic: Configuration
9.2.34.1.3.5 INSERTALLRECORDS Support
Stage and merge targets supports INSERTALLRECORDS
parameter.
See INSERTALLRECORDS in
Reference for Oracle GoldenGate. Set the
INSERTALLRECORDS
parameter in the Replicat parameter file
(.prm
). Set the INSERTALLRECORDS
parameter in
the Replicat parameter file (.prm
)
Setting this property directs the Replicat process to use bulk insert
operations to load operation data into the target table. You can tune the batch size
of bulk inserts using the File Writer property
gg.handler.snowflake.maxFileSize
. The default value is set to
1GB. The frequency of bulk inserts can be tuned using the File writer property
gg.handler.snowflake.fileRollInterval
, the default value is set
to 3m (three minutes).
Note:
- When using the Snowflake internal stage, the staging files can
be compressed by setting
gg.eventhandler.snowflake.putSQLAutoCompress
totrue
. - Consider using the Snowflake Streaming Handler for INSERT-only workloads. See Snowflake Streaming Handler.
Parent topic: Configuration
9.2.34.1.3.6 Snowflake Key Pair Authentication
Snowflake supports key pair authentication as an alternative to basic authentication using username and password.
The path to the private key file must be set in the JDBC connection URL
using the property: private_key_file
.
If the private key file is encrypted, then the connection URL should also include the
property: private_key_file_pwd
.
Additionally, the connection URL should also include the Snowflake user that is
assigned the respective public key by setting the property
user
.
jdbc:snowflake://<account_name>.snowflakecomputing.com/?warehouse=<warehouse-name> &db=<database-name>&private_key_file=/path/to/private/key/rsa_key.p8 &private_key_file_pwd=<private-key-password>&user=<db-user>
Username
and Password
are not
set.
Note:
Oracle recommends you to upgrade Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) to version 21.10.0.0.0. In case you cannot upgrade to 21.10.0.0.0, then modify the JDBC URL to replace '\' characters with '/'.Parent topic: Configuration
9.2.34.1.3.7 Mapping Source JSON/XML to Snowflake VARIANT
JSON
and XML
source column types in the
Oracle GoldenGate trail gets automatically detected and mapped into Snowflake
VARIANT
.
You can inspect the metadata in the Oracle
GoldenGate trail file for JSON
and XML
types
using logdump
.
logdump
output showing JSON
and XML
types:
022/01/06 01:38:54.717.464 Metadata Len 679 RBA 6032 Table Name: CDB1_PDB1.TKGGU1.JSON_TAB1 * 1)Name 2)Data Type 3)External Length 4)Fetch Offset 5)Scale 6)Level 7)Null 8)Bump if Odd 9)Internal Length 10)Binary Length 11)Table Length 12)Most Sig DT 13)Least Sig DT 14)High Precision 15)Low Precision 16)Elementary Item 17)Occurs 18)Key Column 19)Sub DataType 20)Native DataType 21)Character Set 22)Character Length 23)LOB Type 24)Partial Type 25)Remarks * TDR version: 11 Definition for table CDB1_PDB1.TKGGU1.JSON_TAB1 Record Length: 81624 Columns: 7 ID 64 50 0 0 0 0 0 50 50 50 0 0 0 0 1 0 1 2 2 -1 0 0 0 COL 64 4000 56 0 0 1 0 4000 8200 0 0 0 0 0 1 0 0 0 119 0 0 1 1 JSON COL2 64 4000 4062 0 0 1 0 4000 8200 0 0 0 0 0 1 0 0 0 119 0 0 1 1 JSON COL3 64 4000 8068 0 0 1 0 4000 4000 0 0 0 0 0 1 0 0 10 112 -1 0 1 1 XML SYS_NC00005$ 64 8000 12074 0 0 1 0 4000 4000 0 0 0 0 0 1 0 0 4 113 -1 0 1 1 Hidden SYS_IME_OSON_CF27CFDF1CEB4FA2BF85A3D6239A433C 64 65534 16080 0 0 1 0 32767 32767 0 0 0 0 0 1 0 0 4 23 -1 0 0 0 Hidden SYS_IME_OSON_CEE1B31BB4494F6ABF31AC002BEBE941 64 65534 48852 0 0 1 0 32767 32767 0 0 0 0 0 1 0 0 4 23 -1 0 0 0 Hidden End of definition
In this example, COL
and COL2
are
JSON
columns and COL3
is an
XML
column.
Additionally, mapping to
Snowflake VARIANT
is supported only if the source columns are
stored as text.
Parent topic: Configuration
9.2.34.1.3.8 Operation Aggregation
Operation aggregation is the process of aggregating (merging/compressing) multiple operations on the same row into a single output operation based on a threshold.
9.2.34.1.3.8.1 In-Memory Operation Aggregation
- Operation records are aggregated in-memory by default.
- The
gg.aggregate.operations.flush.interval
property has been deprecated and is no longer supported. Ifgg.aggregate.operations.flush.interval
is used in GG for DAA 23ai, then replicat will run; but add a warning to log file about the property being deprecated and not supported. To control the time window for aggregation, usegg.handler.snowflake.fileRollInterval
property. By default, it is set to 3 minutes. Longer intervals will increase latency, and may increase memory usage. Shorter intervals will increase overhead in Oracle GoldenGate and the target database. - Operation aggregation in-memory requires additional JVM memory configuration.
Parent topic: Operation Aggregation
9.2.34.1.3.8.2 Operation Aggregation Using SQL
- To use SQL aggregation, it is mandatory that the trail files
contain uncompressed
UPDATE
operation records, which means that theUPDATE
operations contain full image of the row being updated. - Operation aggregation using SQL can provide better throughput if the trails files contains uncompressed update records.
- Replicat can aggregate operations using SQL statements by setting the
gg.aggregate.operations.using.sql=true
. - You can tune the frequency of merge interval using the File writer
gg.handler.snowflake.fileRollInterval
property, the default value is set to 3m (three minutes). - Operation aggregation using SQL does not require additional JVM memory configuration.
Parent topic: Operation Aggregation
9.2.34.1.3.9 Compressed Update Handling
A compressed update record contains values for the key columns and the modified columns.
An uncompressed update record contains values for all the columns.
Oracle GoldenGate trails may contain compressed or uncompressed update
records. The default extract configuration writes compressed updates to the trails.
The parameter gg.compressed.update
can be set to
true
or false
to indicate compressed or
uncompressed update records.
Parent topic: Configuration
9.2.34.1.3.9.1 MERGE Statement with Uncompressed Updates
In some use cases, if the trail contains uncompressed update records,
then the MERGE SQL
statement can be optimized for better
performance by setting gg.compressed.update=false
. If you want to
use DELETE+INSERT SQL
statements instead of a MERGE
SQL
statement, then set
gg.eventhandler.snowflake.deleteInsert=true
.
Parent topic: Compressed Update Handling
9.2.34.1.3.10 End-to-End Configuration
The following is an end-end configuration example which uses auto-configuration.
Location of the sample properties file:<OGGDIR>/AdapterExamples/big-data/snowflake/
sf.props
: Configuration using internal stagesf-s3.props
: Configuration using S3 stage.sf-az.props
: Configuration using ADLS Gen2 stage.sf-gcs.props
: Configuration using GCS stage.
# Note: Recommended to only edit the configuration marked as TODO gg.target=snowflake #The Snowflake Event Handler #TODO: Edit JDBC ConnectionUrl gg.eventhandler.snowflake.connectionURL=jdbc:snowflake://<account_name>.snowflakecomputing.com/?warehouse=<warehouse-name>&db=<database-name> #TODO: Edit JDBC user name gg.eventhandler.snowflake.UserName=<db user name> #TODO: Edit JDBC password gg.eventhandler.snowflake.Password=<db password> # Configuration to load GoldenGate trail operation records into Snowflake using Snowflake internal stage. #TODO:Set the classpath to include Snowflake JDBC driver. gg.classpath=$THIRD_PARTY_DIR/snowflake/* #TODO:Provide sufficient memory. jvm.bootoptions=-Xmx8g -Xms8g # Configuration to load GoldenGate trail operation records into Snowflake using S3 stage. #gg.stage=s3 #The S3 Event Handler #TODO: Edit the AWS region #gg.eventhandler.s3.region=<aws region> #TODO: Edit the AWS S3 bucket #gg.eventhandler.s3.bucketMappingTemplate=<s3 bucket> #TODO:Set the classpath to include AWS Java SDK and Snowflake JDBC driver. #gg.classpath=$THIRD_PARTY_DIR/s3/*:$THIRD_PARTY_DIR/snowflake/* #TODO:Set the AWS access key and secret key. Provide sufficient memory. #jvm.bootoptions=-Daws.accessKeyId=<AWS access key> -Daws.secretKey=<AWS secret key> -Xmx8g -Xms8g # Configuration to load GoldenGate trail operation records into Snowflake using ADLS Gen2 stage. #gg.stage=abs # Azure Blob Event handler. #gg.eventhandler.abs.bucketMappingTemplate=<azure_adls_gen2_container_name> #gg.eventhandler.abs.accountName=<azure_storage_account_name> #gg.eventhandler.abs.accountKey=<azure_storage_account_key> #TODO: Edit snowflake storage integration to access Azure Blob Storage. #gg.eventhandler.snowflake.storageIntegration=<azure_int> #TODO: Edit the classpath to include HDFS Event Handler dependencies and Snowflake JDBC driver. #gg.classpath=$THIRD_PARTY_DIR/abs/*:$THIRD_PARTY_DIR/snowflake/* #TODO: Provide sufficient memory. #jvm.bootoptions=-Xmx8g -Xms8g # Configuration to load GoldenGate trail operation records into Snowflake using GCS stage. #gg.stage=gcs ## The GCS Event handler #TODO: Edit the GCS bucket name #gg.eventhandler.gcs.bucketMappingTemplate=<gcs bucket> #TODO: Edit the GCS credentialsFile #gg.eventhandler.gcs.credentialsFile=<oggbd-project-credentials.json> #TODO: Edit snowflake storage integration to access GCS. #gg.eventhandler.snowflake.storageIntegration=<gcs_int> #TODO: Edit the classpath to include GCS Java SDK and Snowflake JDBC driver. #gg.classpath=$THIRD_PARTY_DIR/gcs/*:$THIRD_PARTY_DIR/snowflake/* #TODO: Provide sufficient memory. #jvm.bootoptions=-Xmx8g -Xms8g
Parent topic: Configuration
9.2.34.1.3.11 Table Mapping
If the MAP
statement does not specify a target database, then the
database set in the JDBC connection URL will be used. The handler will log the default
database during initialization.
Connection catalog is set to [DB_1].
Parent topic: Configuration
9.2.34.1.3.11.1 Mapping Table
Table 9-40 Snowflake Mapping Table
MAP statement in the Replicat parameter file | Snowflake Database | Snowflake Schema | Snowflake Table |
---|---|---|---|
MAP SCHEMA_1.TABLE_1, TARGET
"schema_1"."table_1"; |
Default database | schema_1 |
table_1 |
MAP DB_1.SCHEMA_1.TABLE_1, TARGET "db_1"."schema_1"."table_1"
|
db_1 |
schema_1 |
table_1 |
Parent topic: Table Mapping
9.2.34.1.4 Snowflake Iceberg Tables
Apache Iceberg tables for Snowflake combine the performance and query semantics of typical Snowflake tables with external cloud storage that you manage.
They are ideal for existing data lakes that you cannot, or choose not to, store in Snowflake. For more information, see https://docs.snowflake.com/en/user-guide/tables-iceberg.
Oracle GoldenGate can automatically create Snowflake Iceberg tables by
setting the property gg.eventhandler.snowflake.tableType
to
iceberg
.
The property
gg.eventhandler.snowflake.icebergExternalVolume
must be set to the
external volume name to connect Snowflake to your external cloud storage for Iceberg
tables. The property gg.eventhandler.snowflake.icebergBaseLocation
must
be set to the base location path for the Snowflake Iceberg table.
Oracle GoldenGate will append the qualified table name to the base location path to create the Iceberg table.
For example:
CREATE ICEBERG TABLE "PUBLIC"."ICEBERG_TCUSTMER" ( "CUST_CODE" VARCHAR, "NAME" VARCHAR, "CITY" VARCHAR, "STATE" VARCHAR, PRIMARY KEY ("CUST_CODE") ) CATALOG = 'SNOWFLAKE' EXTERNAL_VOLUME = 'iceberg_external_volume' BASE_LOCATION = 'SCHEMA_1/ICEBERG_TCUSTMER' CATALOG_SYNC = 'polaris_catalog_integration';
- External Volume
- External Iceberg Catalog Synchronization
- Co-existence of Snowflake Native and Iceberg Tables
- Limitations
Parent topic: Snowflake Stage and Merge Handler
9.2.34.1.4.1 External Volume
An external volume is a named, account-level Snowflake object that you use to connect Snowflake to your external cloud storage for Iceberg tables.
You can create an external volume in Snowflake by following the steps here: https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume.
Parent topic: Snowflake Iceberg Tables
9.2.34.1.4.2 External Iceberg Catalog Synchronization
To query a Snowflake-managed Apache Iceberg table using a third-party engine such as Apache Spark, you can sync the table with an external Iceberg catalog. Currently, Polaris is only external Iceberg catalog that is supported by Snowflake for synchronization.
This can be enabled by setting the property
gg.eventhandler.snowflake.catalogIntegrationName
to the catalog
integration name pointing to the external catalog.
You can create a catalog integration for an external catalog by using
the CREATE CATALOG INTEGRATION
SQL statement.
See https://docs.snowflake.com/en/user-guide/tables-iceberg-polaris-sync for steps to setup catalog synchronization.
Parent topic: Snowflake Iceberg Tables
9.2.34.1.4.3 Co-existence of Snowflake Native and Iceberg Tables
The co-existence of Snowflake native and Iceberg tables can occur provided the target tables exist in the same Snowflake warehouse, a single replicat process can replicate to both Snowflake native and Iceberg tables.
Parent topic: Snowflake Iceberg Tables
9.2.34.1.4.4 Limitations
- A default Snowflake external volume can be created at the Snowflake account-level,
database-level, or schema-level. This can be overridden by setting the property
gg.eventhandler.snowflake.icebergExternalVolume
.External volume cannot be set at the table-level. All the tables in the same replicat process will use one external volume.
- A single replicat process cannot automatically create both Snowflake Iceberg tables and Snowflake native tables.
Parent topic: Snowflake Iceberg Tables
9.2.34.1.5 Troubleshooting and Diagnostics
- Connectivity issues to Snowflake:
- Validate JDBC connection URL, username, and password.
- Check HTTP(S) proxy configuration if running Replicat process behind a proxy.
- DDL not applied on the target table: Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) does not support DDL replication.
- SQL Errors: In case there are any errors while executing any SQL, the SQL statements along with the bind parameter values are logged into the GG for DAA handler log file.
- Co-existence of the components: When using an external
stage location (S3, ADLS Gen 2 or GCS), the location/region of the machine
where the Replicat process is running and the object store’s region have an
impact on the overall throughput of the apply process.
For the best possible throughput, the components need to be located ideally in the same region or as close as possible.
- Replicat ABEND due to partial LOB records in the trail
file: GG for DAA does not support replication of partial LOB data.
The trail file needs to be regenerated by Oracle Integrated capture using
TRANLOGOPTIONS FETCHPARTIALLOB
option in the Extract parameter file. - When replicating to more than ten target tables, the parameter
maxConnnections
can be increased to a higher value which can improve throughput.Note:
When tuning this, increasing the parameter value would create more JDBC connections on the Snowflake data warehouse.You can consult your Snowflake Database administrators so that the data warehouse health is not compromised. - The Snowflake JDBC driver uses the standard Java log utility. The log levels
of the JDBC driver can be set using the JDBC connection parameter tracing.
The tracing level can be set in the Snowflake Event handler property
gg.eventhandler.snowflake.connectionURL
.The following is an example of editing this property:
For more information, see https://docs.snowflake.com/en/user-guide/jdbc-parameters.html#tracing.jdbc:snowflake://<account_name>.snowflakecomputing.com/?warehouse=<warehouse-name>&db=<database-name>&tracing=SEVERE
- Exception: net.snowflake.client.jdbc.SnowflakeReauthenticationRequest:
Authentication token has expired. The user must authenticate
again.
This error occurs when are extended periods of inactivity. To resolve this, you can set the JDBC parameter
CLIENT_SESSION_KEEP_ALIVE
to force the database user to login after a period of inactivity in the session. For example,jdbc:snowflake://<account_name>.snowflakecomputing.com/?warehouse=<warehouse-name>&db=<database-name>&CLIENT_SESSION_KEEP_ALIVE=true
- Replicat stops with an out of memory error: Decrease the
gg.aggregate.operations.flush.interval
value if you are not using its default value (30000). - Performance issue while replicating Large Object (LOB) column
values: LOB processing can lead to slowness. For every LOB column
that exceeds the inline LOB threshold, an
UPDATE SQL
is executed. Look for the following message to tune throughput during LOB processing:The current operation at position
Check the trail files that contain LOB data and get a maximum size of[<seqno>/<rba>]
for table [<tablename>] contains a LOB column [<column name>] of length [<N>] bytes that exceeds the threshold of maximum inline LOB size [<N>]. Operation Aggregator will flush merged operations, which can degrade performance. The maximum inline LOB size in bytes can be tuned using the configurationgg.maxInlineLobSize
.BLOB/CLOB
columns. Alternatively, check the source table definitions to determine the maximum size of LOB data. The default inline LOB size is set to 16000 bytes, which can be increased to a higher value so that all LOB column updates are processed in batches. The configuration property isgg.maxInlineLobSize
`. For example: Ingg.maxInlineLobSize=24000000 -->
, all LOBs up to 24 MB are processed inline. You need to reposition the Replicat, purge the state files, data directory, and start over, so that bigger staging files are generated. - Error message: No database is set in the current session. Please set a
database in the JDBC connection url
[gg.eventhandler.snowflake.connectionURL] using the option
'db=<database name>'.`
Resolution: Set the database name in the configuration property
gg.eventhandler.snowflake.connectionURL
. - Warning message: No role is set in the current session. Please set a
custom role name in the JDBC connection url
[gg.eventhandler.snowflake.connectionURL] using the option
'role=<role name>' if the warehouse [{}] requires a custom role to
access it.
Resolution: In some cases a custom role is required to access the Snowflake warehouse, set the role in the configuration property
gg.eventhandler.snowflake.connectionURL
. - Error message: No active warehouse selected in the current session.
Please set the warehouse name (and custom role name if required to
access the respective warehouse) in the JDBC connection url
[gg.eventhandler.snowflake.connectionURL] using the options
'warehouse=<warehouse name>' and 'role=<role
name>'.
Resolution: Set the warehouse and role in the configuration property
gg.eventhandler.snowflake.connectionURL
. -
Error message: `ERROR 2024-06-07 05:52:23.000344 [main] - JDBCMDP-00034
Current attempt to connect failed with error: [Private key provided is
invalid or not supported: ./rsa_key_sanav2.p8: PBE parameter parsing
error: expecting the object identifier for AES cipher]`
Resolution: This is a recent issue in the Snowflake JDBC driver. The workaround is to upgrade to the Snowflake JDBC driver version 3.16.1, and add the Java system property
-Dnet.snowflake.jdbc.enableBouncyCastle=true
to thejvm.bootoptions
parameter in the Replicat properties file.
Parent topic: Snowflake Stage and Merge Handler
9.2.34.2 Snowflake Streaming Handler
- Overview
- Detailed Functionality
- Configuration
- Classpath Configuration
- Proxy Configuration
- Snowflake Streaming Handler Key Pair Authentication
- Sample Configuration
- Troubleshooting and Diagnostics
Parent topic: Snowflake
9.2.34.2.1 Overview
The Snowflake Streaming Handler replicates data into Snowflake using the Snowpipe Streaming API. This handler supports INSERT-only workloads using the Snowpipe Streaming API, which can result in lower load latencies at a lower cost for loading data into Snowflake.
Note:
If your workload includes updates and deletes, consider using the Snowflake Stage and Merge Handler.For more information, see Snowpipe Streaming documentation.
Parent topic: Snowflake Streaming Handler
9.2.34.2.2 Detailed Functionality
Note:
The Snowflake Streaming Handler supportsINSERT
-only workloads. You can leverage
INSERTALLRECORDS
to convert the update and delete
statements if needed.
Parent topic: Snowflake Streaming Handler
9.2.34.2.2.1 Database User Privileges
The database user used for replicating into Snowflake has to be granted the following privileges:
INSERT
on the target tables.- Optionally,
CREATE TABLE
if settinggg.handler.snow.createTable
totrue
.
Parent topic: Detailed Functionality
9.2.34.2.2.2 Prerequisites
- Oracle GoldenGate trails must be configured to generate
INSERT
operations only. If update and delete operations are converted to inserts by using parameters likeINSERTUPDATE
,INSERTDELETE
, orINSERTALLRECORDS
then those inserts are also supported by handler.
Parent topic: Detailed Functionality
9.2.34.2.2.3 Staging using Snowpipe Streaming API
The Snowpipe Streaming API allows for a low-latency ingest into the target table.
Parent topic: Detailed Functionality
9.2.34.2.3 Configuration
The configuration of the Snowflake replication properties is stored in the Replicat properties file.
Note:
Ensure to specify the path to the properties file in the parameter file only when using Coordinated Replicat. Add the following line to the parameter file:TARGETDB LIBFILE libggjava.so SET property=<parameter file directory>/<properties file name>
The following are configuration properties available for the Snowflake Streaming handler, the required ones must be changed to match your Snowflake configuration.
Table 9-41 Snowflake Streaming Handler Configuration
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handlerlist |
Required | String value. For example,
snow .
|
None | Choose the name snow for the
handler.
|
gg.handler.<name>.type |
Required | snowflakestreaming |
None | Type of handler to use. |
gg.handler.<name>.account |
Required | String value. | None | Snowflake account name. |
gg.handler.<name>.user |
Required | String value. | None | Snowflake data warehouse user. |
gg.handler.<name>.role |
Optional | String value. | ACCOUNTADMIN |
Snowflake data warehouse role. |
gg.handler.<name>.warehouse |
Required | String value. | None | Snowflake data warehouse name. |
gg.handler.<name>.database |
Required | String value. | None | Snowflake default database name used during connection. |
gg.handler.<name>.privateKeyFile |
Required | String value. | None | Specifies the fully qualified path to the private key file for the user. This is used for key-pair authentication. |
gg.handler.<name>.privateKeyFilePassword |
Optional | String value. | None | Specifies the password for the private key file in case the private key file is encrypted. |
gg.handler.<name>.createTable |
Optional | true or false |
true | Set to true to automatically create the target
table if it does not exist.
|
gg.handler.<name>.flushTimeout |
Optional | Numeric value | 30 seconds | Set the flush timeout for streaming operations to commit to target. The value should be in seconds. |
Parent topic: Snowflake Streaming Handler
9.2.34.2.4 Classpath Configuration
Snowflake Streaming Handler uses the Snowflake Ingest Java SDK. Ensure
that the gg.classpath
configuration parameter includes the path to
the Ingest SDK.
Parent topic: Snowflake Streaming Handler
9.2.34.2.4.1 Dependencies
You can download the Dependency Downloader tool to download the dependencies by running the following script:
<OGGDIR>/DependencyDownloader/snowflake_streaming.sh.
For more information about Dependency Downloader, see Dependency Downloader.
Parent topic: Classpath Configuration
9.2.34.2.4.2 Maven Co-ordinates
<dependency> <groupId>net.snowflake</groupId> <artifactId>snowflake-ingest-sdk</artifactId> <version>2.1.0</version> <scope>provided</scope> </dependency>
Parent topic: Classpath Configuration
9.2.34.2.5 Proxy Configuration
When the Replicat process runs behind a proxy server, you can use the
jvm.bootoptions
property to set the proxy
server configuration.
Example:
jvm.bootoptions= -Dhttp.useProxy=true -Dhttp.proxyHost=<some-proxy-address.com> -Dhttp.proxy.port=<some-port-number>
Parent topic: Snowflake Streaming Handler
9.2.34.2.6 Snowflake Streaming Handler Key Pair Authentication
Snowflake Streaming API requires using key pair authentication. The path
to the private key file must be set using the property:
gg.handler.snow.privateKeyFile
.
If the private key file is encrypted, specify its password using the
property: gg.handler.snow.privateKeyFilePassword
Additionally, include the Snowflake user that is assigned to the
respective public key by setting the property gg.handler.snow.user
.
Parent topic: Snowflake Streaming Handler
9.2.34.2.7 Sample Configuration
<OGGDIR>/AdapterExamples/big-data/snowflake_streaming/
.# Note: Recommended to only edit the configuration marked as TODO gg.handlerlist=snow gg.handler.snow.type=snowflakestreaming #TODO: Edit database user. gg.handler.snow.user=<db-user> #TODO: Edit account name. gg.handler.snow.account=<account-name> #TODO: Edit role name. gg.handler.snow.role=<role-name> #TODO: Edit warehouse name. gg.handler.snow.warehouse=<warehouse-name> #TODO: Edit default database name. gg.handler.snow.database=<default-db-name> #TODO: Edit path to the private key file. gg.handler.snow.privateKeyFile=/path/to/private/key/file/rsa_key.p8 #TODO: Edit password for the private key file. gg.handler.snow.privateKeyFilePassword=<some-password> #TODO:Set the classpath to include Snowflake ingest SDK and the Snowflake JDBC driver. gg.classpath=.snowflake-ingest-sdk-2.1.1.jar
Parent topic: Snowflake Streaming Handler
9.2.34.2.8 Troubleshooting and Diagnostics
- Connectivity issues to Snowflake:
- Validate configuration parameters:
account
,user
,role
,warehouse
,privateKeyFile
,privateKeyFilePassword
, anddatabase
. - Check HTTP(S) proxy configuration if running Replicat process behind a proxy.
- Validate configuration parameters:
- DDL not applied on the target table: GG for DAA does not support DDL replication.
- SQL Errors: In case there are any errors while executing any SQL, the SQL statements along with the bind parameter values are logged into the GG for DAA handler log file.
Parent topic: Snowflake Streaming Handler