9.2.12 Azure Blob Storage
Topics:
- Overview
- Prerequisites
- Storage Account, Container, and Objects
- Configuration
- Troubleshooting and Diagnostics
Parent topic: Target
9.2.12.1 Overview
Azure Blob Storage (ABS) is a service for storing objects in Azure cloud. It is highly scalable and is a secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning. You can use the Azure Blob Storage Event handler to load files generated by the File Writer handler into ABS.
Parent topic: Azure Blob Storage
9.2.12.2 Prerequisites
- Azure cloud account set up.
- Java Software Development Kit (SDK) for Azure Blob Storage.
Parent topic: Azure Blob Storage
9.2.12.3 Storage Account, Container, and Objects
- Storage Account: An Azure storage account contains all of your Azure Storage data objects: blobs, file shares, queues, tables, and disks.
- Container: A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs.
- Objects/blobs: Objects or blobs are the individual pieces of data that you store in a storage account container.
Parent topic: Azure Blob Storage
9.2.12.4 Configuration
To enable the selection of the ABS Event Handler, you must first
configure the Event Handler type by specifying
gg.eventhandler.name.type=abs
and the following ABS
properties:
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.name.type |
Required | abs | None | Selects the ABS Event Handler for use with File Writer handler. |
gg.eventhandler.name.bucketMappingTemplate |
Required | A string with resolvable keywords and constants used to dynamically generate a Azure storage account container name. | None | A container is created by the ABS Event handler if it does not exist using this name. See https://docs.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#container-names. For supported keywords, see Template Keywords |
gg.eventhandler.name.pathMappingTemplate |
Required | A string with resolvable keywords and constants used to dynamically generate the path in the Azure storage account container to write the file. | None | Use keywords interlaced with constants to
dynamically generate a unique Azure storage account container path
names at runtime. Sample path name:
ogg/data/${groupName}/${fullyQualifiedTableName} .
For supported keywords, see Template Keywords
|
gg.eventhandler.name.fileNameMappingTemplate |
Optional | A string with resolvable keywords and constants used to dynamically generate a file name for the Azure Blob object. | None | Use resolvable keywords and constants used to dynamically generate the Azure Blob object file name. If not set, the upstream file name is used. For supported keywords, see Template Keywords |
gg.eventhandler.name.finalizeAction |
Optional | none | delete |
none |
Set to none to leave the Azure Blob
data file in place on the finalize action. Set to
delete if you want to delete the Azure Blob
data file with the finalize action.
|
gg.eventhandler.name.eventHandler |
Optional | A unique string identifier cross referencing a child event handler. | No event handler configured. | Sets the downstream event handler that is invoked on the file roll event. |
gg.eventhandler.name.accountName |
Required | String | None | Azure storage account name. |
gg.eventhandler.name.accountKey |
Optional | String | None | Azure storage account key. |
gg.eventhandler.name.sasToken |
Optional | String | None | Sets a credential that uses a shared access signature (SAS) to authenticate to an Azure Service. |
gg.eventhandler.name.tenantId |
Optional | String | None | Sets the Azure tenant ID of the application. |
gg.eventhandler.name.clientId |
Optional | String | None | Sets the Azure client ID of the application. |
gg.eventhandler.name.clientSecret |
Optional | String | None | Sets the Azure client secret for the authentication. |
gg.eventhandler.name.accessTier |
Optional | Hot | Cool | Archive |
None | Sets the tier on a Azure blob/object. Azure storage offers different access tiers, allowing you to store blob object data in the most cost-effective manner. Available access tiers include Hot, Cool and Archive. For more information, see https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers. |
gg.eventhandler.name.endpoint |
Optional | String |
https://<accountName>.blob.core.windows.net |
Sets the Azure Storage service endpoint. See Azure Government Cloud Configuration |
- Classpath Configuration
- Dependencies
- Authentication
- Proxy Configuration
- Sample Configuration
- Azure Government Cloud Configuration
Parent topic: Azure Blob Storage
9.2.12.4.1 Classpath Configuration
The ABS Event handler uses the Java SDK for Azure Blob Storage.
Note:
Ensure that the classpath includes the path to the Azure Blob Storage Java SDK.Parent topic: Configuration
9.2.12.4.2 Dependencies
<dependencies> <dependency> <groupId>com.azure</groupId> <artifactId>azure-storage-blob</artifactId> <version>12.13.0</version> </dependency> <dependency> <groupId>com.azure</groupId> <artifactId>azure-identity</artifactId> <version>1.3.3</version> </dependency> </dependencies>
Parent topic: Configuration
9.2.12.4.3 Authentication
accountKey
sasToken
tenandId
,clientID
, andclientSecret
accounkKey
has the highest precedence, followed by
sasToken
. If accountKey
and sasToken
are not set, then the tuple tenantId
, clientId
, and
clientSecret
are used.
Parent topic: Configuration
9.2.12.4.3.1 Azure Tenant ID, Client ID, and Client Secret
- Go to the Microsoft Azure portal.
- Select Azure Active Directory from the list on the left to view the Azure Active Directory panel.
- Select Properties in the Azure Active Directory panel to view the Azure Active Directory properties.
- Go to the Microsoft Azure portal.
- Select All Services from the list on the left to view the Azure Services Listing.
- Enter App into the filter command box and select App Registrations from the listed services.
- Select the App Registration you created to access Azure Storage.
Parent topic: Authentication
9.2.12.4.4 Proxy Configuration
When the process is run behind a proxy server, the jvm.bootoptions
property can be used to set proxy server configuration using well-known Java proxy
properties.
For example:
jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80
-Djava.net.useSystemProxies=true
Parent topic: Configuration
9.2.12.4.5 Sample Configuration
#The ABS Event Handler gg.eventhandler.abs.type=abs gg.eventhandler.abs.pathMappingTemplate=${fullyQualifiedTableName} #TODO: Edit the Azure Blob Storage container name gg.eventhandler.abs.bucketMappingTemplate=<abs-container-name> gg.eventhandler.abs.finalizeAction=none #TODO: Edit the Azure storage account name. gg.eventhandler.abs.accountName=<storage-account-name> #TODO: Edit the Azure storage account key. #gg.eventhandler.abs.accountKey=<storage-account-key> #TODO: Edit the Azure shared access signature(SAS) to authenticate to an Azure Service. #gg.eventhandler.abs.sasToken=<sas-token> #TODO: Edit the the tenant ID of the application. gg.eventhandler.abs.tenantId=<azure-tenant-id> #TODO: Edit the the client ID of the application. gg.eventhandler.abs.clientId=<azure-client-id> #TODO: Edit the the client secret for the authentication. gg.eventhandler.abs.clientSecret=<azure-client-secret> gg.classpath=/path/to/abs-deps/* #TODO: Edit the proxy configuration. #jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true
Parent topic: Configuration
9.2.12.4.6 Azure Government Cloud Configuration
Additional configuration is required if Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) has to replicate data to storage accounts that reside in Azure Government cloud.
AZURE_AUTHORITY_HOST
and gg.eventhandler.{name}.endpoint
as per the following table:
Government cloud | AZURE_AUTHORITY_HOST | gg.eventhandler.{name}.endpoint |
---|---|---|
Azure US Government Cloud |
|
|
Azure German Cloud |
|
https://<storage-account-name>.blob.core.cloudapi.de |
Azure China Cloud |
https://login.chinacloudapi.cn |
https://<storage-account-name>.blob.core.chinacloudapi.cn |
The environment variable can be set in the replicat prm file using the Oracle
GoldenGate setenv
parameter.
Example:
setenv (AZURE_AUTHORITY_HOST = "https://login.microsoftonline.us")
Parent topic: Configuration
9.2.12.5 Troubleshooting and Diagnostics
Error: Confidential Client is not supported in Cross Cloud request.
This indicates that the target Azure storage account resides in one of the Azure Government clouds. Set the required configuration as per Azure Government Cloud Configuration.Duplicate records after Replicat Recovery
ADLS replication uses File Writer Handler and ADLS Handler in the replicat. Oracle
GoldenGate prioritizes no data loss and guarantees no data loss in case of failures
by at least once semantics in ADLS (json
, csv
,
delimtedtext
, avro_orc
,
parquet
) delivery. In the cases if replicat runs fine and
normally shut down, then exactly once is supported. In case of failures (because of
network failures), there are various reason that can lead into duplicates in
recovery.
Two cases where duplicates can occur
- If data is written and a failure occurs between when the data is written, and when the checkpoint is moved. Then upon restart the replicat backs up to the previous checkpoint and data can unfortunately be replayed.
- The rolling of the data files occurs based on customer configured triggers. Trigger can be file size, time, inactivity, or time of day. The rolling does not necessarily happen on a transaction commit boundary. The trigger causes writing to the current file to complete, the post processing transformation and movement complete, and any state on that file is deleted. If a replicat abend occurs in between when the rolling is processed and when the checkpoint is moved, then upon restart, it can again replay those messages.
If you observe duplicate records in case of ADLS replicat recovery, then it is an expected behavior. If you observe duplicates while replicat is running fine, then file a support ticket.
Parent topic: Azure Blob Storage