8.2.13 Azure Synapse Analytics Data Warehouse
Microsoft Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) analytics.
8.2.13.1 Detailed Functionality
Replication to Synapse uses stage and merge data flow.
The change data is staged in a temporary location in micro-batches and eventually merged into the target table.
Azure Data Lake Storage (ADLS) Gen 2 is used as the staging area for change data.
The Synapse Event handler is used as a downstream Event handler connected to the output of the Parquet Event handler.
The Parquet Event handler loads files generated by the File Writer Handler into ADLS Gen2.
The Synapse Event handler executes SQL statements to merge the operation records staged in ADLS Gen2.
The SQL operations are performed in batches providing better throughput.
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) uses
the MERGE SQL
statement or a combination of DELETE
and
INSERT
SQL statements to perform the merge operation.
Parent topic: Azure Synapse Analytics Data Warehouse
8.2.13.1.1 Database User Privileges
Database user used for replication has to be granted the following privileges:
INSERT
,UPDATE
,DELETE
, andTRUNCATE
on the target tables.CREATE
andDROP
Synapse external file format.CREATE
andDROP
Synapse external data source.CREATE
andDROP
Synapse external table.
Parent topic: Detailed Functionality
8.2.13.1.2 Merge SQL Statement
The merge SQL statement for Azure Synapse Analytics was
made generally available during the later part of the year 2022 and
therefore Oracle GoldenGate for Distributed Applications and
Analytics (GG for DAA) uses merge statement by default. To disable
merge SQL, ensure that a Java System property is set in the
jvm.bootoptions
parameter.
jvm.bootoptions=-Dsynapse.use.merge.sql=false
Parent topic: Detailed Functionality
8.2.13.1.3 Prerequisites
The following are the prerequisites:
- Uncompressed
UPDATE
records: If Oracle GoldenGate is configured to not use merge statement (see Merge SQL Statement), then it is mandatory that the trail files used to apply to Synapse contain uncompressedUPDATE
operation records, which means that theUPDATE
operations contain full image of the row being updated. IfUPDATE
records have missing columns, then replicat willABEND
on detecting a compressedUPDATE
trail record. - If Oracle GoldenGate is configured to use merge statement (see Merge SQL Statement), then the target table must be a hash distributed table.
- Target table existence: The target tables should exist on the Synapse database.
- Azure storage account: An Azure storage account and container should
exist.
Oracle recommends co-locating the Azure Synapse workspace, and the Azure storage account in the same azure region.
- If Oracle GoldenGate is configured to use merge statement, then the target
table cannot define
IDENTITY
columns because Synapse merge statement does not support inserting data intoIDENTITY
columns. For more information about merging SQL statement, see Merge SQL Statement.
Parent topic: Detailed Functionality
8.2.13.2 Configuration
- Automatic Configuration
- Synapse Database Credentials
- Classpath Configuration
- INSERTALLRECORDS Support
- Large Object (LOB) Performance
- End-to-End Configuration
- Compressed Update Handling
Parent topic: Azure Synapse Analytics Data Warehouse
8.2.13.2.1 Automatic Configuration
Synapse replication involves configuration of multiple components, such as File Writer handler, Parquet Event handler, and Synapse Event handler.
The Automatic Configuration functionality helps to auto configure these components so that the user configuration is minimal.
The properties modified by auto configuration will also be logged in the handler log file.
To enable auto-configuration to replicate to Synapse target we need to set the
parameter as follows: gg.target=synapse
.
When replicating to Synapse target, customization of Parquet Event handler name and Synapse Event handler name is not allowed.
8.2.13.2.1.1 File Writer Handler Configuration
synapse
. The following is an example to edit a property of File
Writer handler:
gg.handler.synapse.pathMappingTemplate=./dirout
Parent topic: Automatic Configuration
8.2.13.2.1.2 Synapse Event Handler Configuration
Synapse Event Handler name is pre-set to the value
synapse
.
Table 8-12 Synapse Event Handler Configuration
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.synapse.connectionURL |
Required |
jdbc:sqlserver://<synapse- workspace>.sql.azuresynapse.net:1433;database= <db-name>;encrypt=true; trustServerCertificate=false; hostNameInCertificate=*.sql.azuresynapse.net; loginTimeout=300; |
None | JDBC URL to connect to Synapse. |
gg.eventhandler.synapse.UserName |
Required | Database username. | None | Synapse database user in the Synapse workspace. The
username has to be qualified with the Synapse workspace name.
Example: sqladminuser@synapseworkspace .
|
gg.eventhandler.synapse.Password |
Required | Supported database string. | None | Synapse database password. |
gg.eventhandler.synapse.credential |
Required | Credential name. | None | Synapse database credential name to access Azure Data Lake Gen2 files. See Synapse Database Credentials for steps to create credential. |
gg.eventhandler.synapse.maxConnnections |
Optional | Integer value | 10 | Use this parameter to control the number of concurrent JDBC database connections to the target Synapse database. |
gg.eventhandler.synapse.dropStagingTablesOnShutdown |
Optional | true or
false |
false |
If set to true , the temporary
staging tables created by GoldenGate will be dropped on replicat
graceful stop.
|
gg.maxInlineLobSize |
Optional | Integer Value | 16000 | This parameter can be used to set the maximum inline size of large object (LOB) columns in bytes. For more information, see Large Object (LOB) Performance. |
gg.aggregate.operations.flush.interval |
Optional | Integer | 30000 | The flush interval parameter determines how often
the data gets merged into Synapse. The value is set in milliseconds.
Use with caution! The higher the value, larger data will have to be
stored in the memory of the Replicat process.
Use the flush interval parameter with caution. Increasing its default value increases the amount of data stored in the internal memory of the Replicat. This can cause out-of-memory errors and stop the Replicat if it runs out of memory. |
gg.operation.aggregator.validate.keyupdate
|
Optional | true or
false |
false |
If set to true , Operation
Aggregator will validate key update operations (optype 115) and
correct to normal update if no key values have changed. Compressed
key update operations do not qualify for merge.
|
Parent topic: Automatic Configuration
8.2.13.2.2 Synapse Database Credentials
- Connect to the respective Synapse SQL dedicated pool using the Azure Web SQL console
(
https://web.azuresynapse.net/en-us/
). - Create a DB master key if one does not already exist, using your own password.
- Create a database scoped credential. This credential allows Oracle GoldenGate replicat
process to access Azure Storage Account.
Provide the Azure Storage Account name and Access key when creating this credential.
Storage Account Access keys can be retrieved from the Azure cloud console.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Your own password' ; CREATE DATABASE SCOPED CREDENTIAL OGGBD_ADLS_credential WITH -- IDENTITY = '<storage_account_name>' , IDENTITY = 'sanavaccountuseast' , -- SECRET = '<storage_account_key>' SECRET = 'c8C0yR-this-is-a-fake-access-key-Gc9c5mENOJ1mLyxlO1vSRDlRG0/Ke+tbAvi6xe73HAAhLtdMFZRA==' ;
Parent topic: Configuration
8.2.13.2.3 Classpath Configuration
Synapse Event handler relies on the upstream File Writer handler and the Parquet Event handler.
Parent topic: Configuration
8.2.13.2.3.1 Dependencies
- Microsoft SQLServer JDBC driver: The JDBC driver can be
downloaded from Maven central using the following co-ordinates.
<dependency> <groupId>com.microsoft.sqlserver</groupId> <artifactId>mssql-jdbc</artifactId> <version>8.4.1.jre8</version> <scope>provided</scope> </dependency>
<OGGDIR>/DependencyDownloader/synapse.sh
.
-
Parquet Event handler dependencies: See unresolvable-reference.html#GUID-E62904D1-EBFD-4C84-B112-97D1F5D4BE6F to configure classpath to include Parquet dependencies.
- Hadoop Depedencies: Hadoop dependencies can be downloaded using dependency
downloader
<OGGDIR>/DependencyDownloader/hadoop.sh
- Azure Storage dependencies: Azure Storage dependencies can be downloaded from Maven using the following co-ordinates.
<dependencies> <dependency> <groupId>com.azure</groupId> <artifactId>azure-storage-blob</artifactId> <version>12.13.0</version> </dependency> <dependency> <groupId>com.azure</groupId> <artifactId>azure-identity</artifactId> <version>1.3.3</version> </dependency> </dependencies>
Parent topic: Classpath Configuration
8.2.13.2.3.2 Classpath
gg.classpath configuration
parameter to
include the path to the Parquet Event Handler, Synapse JDBC, Azure Storage and hadoop
dependencies.
gg.classpath=/path/to/parquet-deps/*:/path/to/abs-deps/*:path/to/synapse-deps/mssql-jdbc-8.4.1.jre8.jar:/path/to/hadoop-deps/*
Parent topic: Classpath Configuration
8.2.13.2.4 INSERTALLRECORDS Support
Stage and merge targets supports INSERTALLRECORDS
parameter.
See INSERTALLRECORDS in Reference for
Oracle GoldenGate. Set the INSERTALLRECORDS
parameter in
the Replicat parameter file (.prm
). Set the
INSERTALLRECORDS
parameter in the Replicat parameter file
(.prm
)
Setting this property directs the Replicat process to use bulk insert
operations to load operation data into the target table. You can tune the batch size
of bulk inserts using the File Writer property
gg.handler.synapse.maxFileSize
. The default value is set to
1GB. The frequency of bulk inserts can be tuned using the File Writer property
gg.handler.synapse.fileRollInterval
, the default value is set
to 3m (three minutes).
Note:
- When using the Synapse internal stage, the staging files can be
compressed by setting
gg.handler.synapse.putSQLAutoCompress
totrue
.
Parent topic: Configuration
8.2.13.2.5 Large Object (LOB) Performance
gg.maxInlineLobSize
does not qualify for batch processing and such
operations gets slower.
If the compute machine has sufficient RAM, you can increase this parameter to speed up processing.
Parent topic: Configuration
8.2.13.2.6 End-to-End Configuration
The following is an end-end configuration example which uses auto-configuration for FW handler, Parquet and Synapse Event handlers.
This sample properties file can also be found in the directory
AdapterExamples/big-data/synapse/synapse.props
:
# Azure Synapse Analytics Data Warehouse Template # Configuration to load GoldenGate trail operation records into Azure Synapse Analytics by chaining # File writer handler -> Parquet Event handler -> Synapse Event handler. # Note: Recommended to only edit the configuration marked as TODO gg.target=synapse #The Parquet Event Handler No properties are required for the Parquet Event handler. #gg.eventhandler.parquet.finalizeAction=delete ADLS Gen 2 stage (Using Azure Blob SDK). #Azure Blob Event handler #TODO: Edit the Azure Blob Storage container name gg.eventhandler.abs.bucketMappingTemplate=<abs-container-name> #TODO: Edit the Azure storage.account name. gg.eventhandler.abs.accountName=<storage-account-name> #TODO: Edit the Azure storage account key. gg.eventhandler.abs.accountKey=<storage-account-key> #The Synapse Event Handler #TODO: Edit JDBC ConnectionUrl gg.eventhandler.synapse.connectionURL=jdbc:sqlserver://<synapse-workspace>.sql.azuresynapse.net:1433;database=<db-name>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=300; #TODO: Edit JDBC user name gg.eventhandler.synapse.UserName=<db user name>@<synapse-workspace> #TODO: Edit JDBC password gg.eventhandler.synapse.Password=<db password> #TODO: Edit Credential to access Azure storage. gg.eventhandler.synapse.credential=OGGBD_ADLS_credential #TODO: Edit the classpath to include dependencies for Parquet Event Handler, ABS Event handler and the Synapse JDBC driver. gg.classpath=/path/to/parquet-deps/:/path/to/abs-deps/:path/to/synapse-deps/mssql-jdbc-8.4.1.jre8.jar:/path/to/hadoop-deps/* #TODO: Provide sufficient memory (at least 8GB). jvm.bootoptions=-Xmx8g -Xms8g -DSYNAPSE_STAGE=parquet,abs
Parent topic: Configuration
8.2.13.2.7 Compressed Update Handling
A compressed update record contains values for the key columns and the modified columns.
An uncompressed update record contains values for all the columns.
Oracle GoldenGate trails may contain compressed or uncompressed update records. The default extract configuration writes compressed updates to the trails.
The parameter gg.compressed.update
can be set to
true
or false
to indicate
compressed/uncompressed update records.
Parent topic: Configuration
8.2.13.2.7.1 MERGE Statement with Uncompressed Updates
In some use cases, if the trail contains uncompressed update records, then the
MERGE SQL
statement can be optimized for better performance by
setting gg.compressed.update=false
.
Note:
If you want to useDELETE+INSERT SQL
statements instead of a
MERGE SQL
statement, then set
gg.eventhandler.snowflake.deleteInsert=true
.
Parent topic: Compressed Update Handling
8.2.13.3 Troubleshooting and Diagnostics
- Connectivity Issues to Synapse:
- Validate JDBC connection URL, username and password.
- Check if http/https proxy is enabled. Synapse does not support connections over http(s) proxy.
- DDL not applied on the target table: Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) does not support DDL replication.
- Target table existence: It is expected that the Synapse target table exists before starting the replicat process. replicat process will ABEND if the target table is missing.
- SQL Errors: In case there are any errors while executing any SQL, the entire SQL statement along with the bind parameter values are logged into the GG for DAA handler log file.
- Co-existence of the components: The location/region of the machine where replicat process is running, Azure Data Lake Storage container region and the Synapse region would impact the overall throughput of the apply process. Data flow is as follows: Oracle GoldenGate -> Azure Data Lake Gen 2 -> Synapse. For best throughput, the components need to located as close as possible.
- Replicat ABEND due to partial LOB records in the trail file:
GG for DAA Synapse apply does not support replication of partial LOB. The trail
file needs to be regenerated by Oracle Integrated capture using
TRANLOGOPTIONS FETCHPARTIALLOB
option in the extract parameter file. - Error:
com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting date and/or time from character string
:This occurs when the source datetime column and target datetime column are incompatible.
For example: A case where the source column is a timestamp type, and the target column is Synapse time.
- If the Synapse table or column names contain double quotes, then GG for DAA replicat will ABEND.
- Error:
com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer
. This indicates that the data in the external table backed by Azure Data Lake file is not readable. Contact Oracle support. - IDENTITY column in the target table: The Synapse
MERGE
statement does not support inserting data intoIDENTITY
columns. Therefore, ifMERGE
statement is enabled usingjvm.bootoptions=-Dsynapse.use.merge.sql=true
, then Replicat will ABEND with following error message:Exception:com.microsoft.sqlserver.jdbc.SQLServerException: Cannot update identity column 'ORDER_ID'
- Error:
com.microsoft.sqlserver.jdbc.SQLServerException: Merge statements with a
:WHEN NOT MATCHED [BY TARGET]
clause must target a hash distributed tableThis indicates that merge SQL statement is on and Synapse target table is not a hash distributed table. You need to create the target table with a hash distribution.
Parent topic: Azure Synapse Analytics Data Warehouse