9.2.27 Microsoft Fabric OneLake
- Lakehouse in Microsoft Fabric
- Mirrored database in Microsoft Fabric
- OneLake Event Handler Prerequisites
- OneLake Mappings to Azure Data Lake Gen2
- OneLake Event Handler Configuration
- OneLake Event Handler Primary Key Update
- Ingesting into Fabric Mirrored Tables using Partitioning
- OneLake Event Handler Troubleshooting and Diagnostics
Parent topic: Target
9.2.27.1 OneLake Event Handler Prerequisites
- Azure cloud account set up.
- Microsoft Fabric set up.
- Microsoft Fabric capacity along with workspace should exist.
- Microsoft Fabric Lakehouse or Mirrored database should exist for the lakehouse or mirrored database target respectively.
- Create a Microsoft Entra ID app to access the Microsoft Fabric workspace.
- App needs to be granted at least the contributor role on the workspace.
- Enable the app registration (service principal) to access Fabric APIs.
- Admin Portal -> Tenant Settings -> Service principals can use Fabric APIs -> Enabled for the entire organization
- Enable remote access to data stored in OneLake
- Admin Portal -> User can access data stored in OneLAke using Apps external to Fabric.
- Java Software Development Kit (SDK) for Azure Storage File Data Lake.
Parent topic: Microsoft Fabric OneLake
9.2.27.2 OneLake Mappings to Azure Data Lake Gen2
- Storage Account: An Azure storage account contains all of your Azure Storage data
objects: blobs, file shares, queues, tables, and disks.
- OneLake Storage Account name is always
onelake.
- OneLake Storage Account name is always
- Container: A container organizes a set of blobs, similar to a directory in a file
system. A storage account can include an unlimited number of containers, and a container
can store an unlimited number of blobs.
- OneLake container name is mapped to OneLake workspace name.
- Endpoint: The Azure Storage service endpoint.
- OneLake default endpoint is https://onelake.dfs.fabric.microsoft.com, this can be overridden.
Parent topic: Microsoft Fabric OneLake
9.2.27.3 OneLake Event Handler Configuration
- OneLake Event Handler Automatic Configuration
- File Writer Handler Configuration
- Autoconfiguration of Parquet/ORC Event Handler
- OneLake Event Handler Configuration
- File Format for the Lakehouse target
- OneLake Event Handler Classpath Configuration
- OneLake Event Handler Authentication
- OneLake Event Handler Proxy Configuration
- Sample Configuration for Lakehouse Target
- Sample Configuration for Mirrored Database Target
- Performance Considerations
Parent topic: Microsoft Fabric OneLake
9.2.27.3.1 OneLake Event Handler Automatic Configuration
OneLake replication involves configuring multiple components, such as the File Writer Handler, Avro formatter, Parquet Event Handler, ORC Event Handler, and the OneLake Event Handler. The Automatic Configuration functionality will autoconfigure these components so that the user configuration is minimal. The properties modified by auto configuration would be logged in the handler log file.
To enable autoconfiguration to replicate data to the Lakehouse target, set the
parameter gg.target=fabric_lakehouse.
To enable autoconfiguration to replicate data to the mirrored database target,
set the parameter gg.target=fabric_mirrored_database.
Parent topic: OneLake Event Handler Configuration
9.2.27.3.2 File Writer Handler Configuration
The File Writer Handler name is pre set based on the
gg.target configuration. For example, if
gg.target=fabric_lakehouse, then the File Writer Handler name is set to the
value fabric_lakehouse and its properties are automatically set to the
required values for Lakehouse. As per this example, you can add or edit a property of the File
Writer Handler as follows:
gg.handler.fabric_lakehouse.inactivityRollInterval=1m.
Parent topic: OneLake Event Handler Configuration
9.2.27.3.3 Autoconfiguration of Parquet/ORC Event Handler
Event Handler name is pre-set to the value parquet or orc
based on the file format configuration.
Parent topic: OneLake Event Handler Configuration
9.2.27.3.3.1 OneLake Event Handler File Format Configuration for Parquet/ORC
- For use cases that require Parquet files such as Open Mirroring and vanilla
Parquet format, Autoconfiguration will configure the Avro formatter and chains it with a
Parquet event handler, and the OneLake event handler.
This is configured as follows:
gg.format=parquetNote:
For the Open Mirroring target (gg.target=fabric_mirrored_database), the file format configuration is internal and cannot be modified. - For use case that requires ORC files, Autoconfiguration will configure the
Avro formatter and chains it with the ORC event handler, and the OneLake event handler.
This is configured as follows:
gg.format=orc.
Parent topic: Autoconfiguration of Parquet/ORC Event Handler
9.2.27.3.4 OneLake Event Handler Configuration
OneLake Event Handler name is pre set to the value onelake.
gg.target must be set to one of the following values:
fabric_lakehouse: To replicate to Lakehouse in Microsoft Fabric.fabric_mirrored_database: To replicate to Mirrored Database in Microsoft Fabric.
| Properties | Required/Optional | Legal Values | Default | Explanation |
|---|---|---|---|---|
gg.eventhandler.onelake.workspace |
Required | String | None | Sets the Microsoft Fabric workspace name. |
gg.eventhandler.onelake.lakehouse |
Required | String | None | Applicable only to the Lakehouse target. Sets the Microsoft Fabric lakehouse name. |
gg.eventhandler.onelake.mirror |
Required | String | None | Applicable only to the mirrored database target. Sets the mirrored database name in Fabric. |
gg.eventhandler.onelake.tenantId |
Optional | String | None | Sets the Azure tenant ID of the application. |
gg.eventhandler.onelake.clientId |
Optional | String | None | Sets the Azure client ID of the application. |
gg.eventhandler.onelake.clientSecret |
Optional | String | None | Sets the Azure client secret for the authentication. |
gg.eventhandler.onelake.pathMappingTemplate |
Optional | A string with resolvable keywords and constants used to dynamically generate the landing path for data files into OneLake. | If gg.target is set to
fabric_mirrored_database, then the default value is
${catalogname}.MountedRelationalDatabase/Files/LandingZone/${schemaname}.schema/${tablename}.
This cannot be modified. If gg.target=fabric_lakehouse, then the
default value is
${catalogname}.lakehouse/Files/ogg/${groupName}/${schemaname}.schema/${tablename},
this can be modified.
|
Use keywords interlaced with constants to dynamically generate a
path names at runtime. Example path name would be:
ogg/data/${fullyQualifiedTableName}. For more information about the
supported keywords, see Template Keywords.
|
gg.eventhandler.onelake.fileNameMappingTemplate |
Optional | A string with resolvable keywords and constants used to dynamically generate the data file names at runtime. | If gg.format is set to
fabric_mirrored_database, then this value is set to
${custom[]} and cannot be edited. If
gg.target=fabric_lakehouse, then the default value is based on the
upstream handler, and can be modified.
|
Use keywords interlaced with constants to dynamically generate a
unique file name at runtime. Typically, file names follow the format,
${fullyQualifiedTableName}_${groupName}_${currentTimestamp}.txt.
|
gg.eventhandler.onelake.endpoint |
Optional | String | https://onelake.dfs.fabric.microsoft.com |
Sets the Fabric OneLake endpoint. |
gg.format |
Optional | parquet, orc, or one of the GG for
DAA pluggable formatter name.
|
parquet |
Applicable only to the Lakehouse target. Sets the Fabric OneLake file format. For more information, see File Format for the Lakehouse target. |
gg.eventhandler.onelake.replicatGroupPartitionName |
Optional | String | ogg_group_name |
Applicable only If gg.target is set to
fabric_mirrored_database. A string to specify the Fabric partition
column name. Value should be a string between 1 and 20 characters in length.
|
gg.eventhandler.onelake.maxRetries |
Optional | Integer | 3 |
The maximum number of retries to attempt if the file upload to OneLake fails. Value should be between 1 and 20. |
gg.eventhandler.onelake.maxRetries |
Optional | Integer | 10 |
The initial delay between retries (in seconds). The retry delay is initially set to the specified value and doubles with each retry. Value should be between 1 and 120. |
gg.keyupdate.threadrange.behavior |
Optional | ABEND, WARN | If gg.target is set to
fabric_mirrored_database, then the default value is
ABEND. For all the other targets, the default value is not
set.
|
Controls Replicat’s behavior when it encounters a key update
operation for a table configured with THREADRANGE clause.
ABEND aborts processing, while WARN logs a warning.
|
Parent topic: OneLake Event Handler Configuration
9.2.27.3.5 File Format for the Lakehouse target
The parameter gg.format can be configured to set the file
format.
It can be set to one of the following values:
parquet: Generate Parquet format files.orc: Generate ORC format files.- Any other pluggable format supported by Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).
Parent topic: OneLake Event Handler Configuration
9.2.27.3.6 OneLake Event Handler Classpath Configuration
Ensure that the classpath includes the path to the following dependencies:
- Parquet Event handler dependencies including Hadoop dependencies.
- Azure Storage File DataLake Java SDK.
- OneLake Event Handler dependencies.
- Parquet Event Handler dependencies, including Hadoop dependencies.
- Azure Storage File DataLake Java SDK.
9.2.27.3.6.1 OneLake Event Handler Dependencies
The dependency downloader script onelake.sh can be used to
download the OneLake dependencies. Alternatively, you can manually download the OneLake
dependencies using the following maven co-ordinates:
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
<version>12.20.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.13.1</version>
</dependency>
</dependencies>
Edit the gg.classpath configuration parameter to include the path to the
Azure Storage File Data Lake SDK.
Parent topic: OneLake Event Handler Classpath Configuration
9.2.27.3.7 OneLake Event Handler Authentication
You can authenticate the Azure Storage device by configuring following:
tenantIDclientIdclientSecret
9.2.27.3.7.1 Azure Tenant ID, Client ID, and Client Secret
To obtain your Azure tenant ID:
- Go to the Microsoft Azure portal.
- Select Azure Active Directory from the list on the left to view the Azure Active Directory panel.
- Select Properties in the Azure Active Directory
panel to view the Azure Active Directory
properties.
The Azure tenant ID is the field marked as Directory ID.
- To obtain your Azure client ID and client
secret:
- Go to the Microsoft Azure portal.
- Select All Services from the list on the left to view the Azure Services Listing.
- Type App into the filter command box and select App Registrations from the listed services.
- Select the App Registration that
you have created to access Microsoft Fabric workspace.
The Application id displayed for the App Registration is the client ID. The client secret is the generated key string when a new key is added.
This generated key string is available only once when the key is created. If you do not know the generated key string, then create another key making sure you capture the generated key string.
Parent topic: OneLake Event Handler Authentication
9.2.27.3.8 OneLake Event Handler Proxy Configuration
jvm.bootoptions can be
used to set proxy server configuration using
well-known Java proxy properties. For example:
jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=trueParent topic: OneLake Event Handler Configuration
9.2.27.3.9 Sample Configuration for Lakehouse Target
gg.target=fabric_lakehouse #TODO: format can be 'parquet' or 'orc' or one of the pluggable formatter types. Default is 'parquet'. #gg.format=parquet #TODO: Edit the Fabric workspace name. gg.eventhandler.onelake.workspace=<workspace-name> #TODO: Edit the Fabric lakehouse name. gg.eventhandler.onelake.lakehouse=<lakehouse-name> #TODO: Edit the tenant ID of the application. gg.eventhandler.onelake.tenantId=<azure-tenant-id> #TODO: Edit the client ID of the application. gg.eventhandler.onelake.clientId=<azure-client-id> #TODO: Edit the client secret for the authentication. gg.eventhandler.onelake.clientSecret=<azure-client-secret> #TODO: Edit the classpath to include Hadoop, Parquet, and Azure DataLake SDK dependencies. gg.classpath=$THIRD_PARTY_DIR/onelake/*:$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/ #TODO: Edit the proxy configuration. #jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true
Parent topic: OneLake Event Handler Configuration
9.2.27.3.10 Sample Configuration for Mirrored Database Target
gg.target=fabric_mirrored_database #TODO: Edit the Fabric workspace name. gg.eventhandler.onelake.workspace=<workspace-name> #TODO: Edit the Fabric mirror Database name. gg.eventhandler.onelake.mirror=<mirror-name> #TODO: Edit the tenant ID of the application. gg.eventhandler.onelake.tenantId=<azure-tenant-id> #TODO: Edit the client ID of the application. gg.eventhandler.onelake.clientId=<azure-client-id> #TODO: Edit the client secret for the authentication. gg.eventhandler.onelake.clientSecret=<azure-client-secret>#TODO: Edit the classpath to include Hadoop, Parquet, and Azure DataLake SDK dependencies. gg.classpath=$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*:$THIRD_PARTY_DIR/onelake/* #TODO: Edit the proxy configuration. #jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true
Parent topic: OneLake Event Handler Configuration
9.2.27.3.11 Performance Considerations
You can set GROUPTRANSOPS to a higher value (recommended up to 20000).
By default, the value is set to 1000.
For initial load replicats, you can create multiple replicats with the classic mode and configure each replicat to handle one individual table or a discrete subset of tables.
For real-time replicats, you can use coordinated replicat mode to improve performance. While configuring coordinated replicat, tables should be mapped to individual threads.
Parent topic: OneLake Event Handler Configuration
9.2.27.4 OneLake Event Handler Primary Key Update
Primary key UPDATE behavior depends
on the file-format configuration.
Parent topic: Microsoft Fabric OneLake
9.2.27.4.1 Mirrored Database in Microsoft Fabric
When file format is set to
gg.format=fabric_mirroring, then
primary key UPDATE operations will
be split into a DELETE operation
followed by an INSERT operation.
This behavior cannot be modified.
Parent topic: OneLake Event Handler Primary Key Update
9.2.27.4.2 Lakehouse in Microsoft Fabric
If gg.target=fabric_lakehouse is set, then by default primary key
UPDATE operations will result in a Replicat ABEND.
This behavior can be modified by configuration of the formatter property
gg.handler.onelake.format.pkUpdateHandling
The property gg.handler.onelake.format.pkUpdateHandling can
accept one of the following input:
abend: ABEND replicat when a primary keyUPDATEis processed.update: Replicat processes primary keyUPDATEas a regularUPDATE.delete-insert: Replicat would split primary keyUPDATEinto aDELETEoperation followed by anINSERToperation.
Parent topic: OneLake Event Handler Primary Key Update
9.2.27.5 Ingesting into Fabric Mirrored Tables using Partitioning
The existing solution for ingesting data into Fabric mirrored tables uses a single landing folder for a specific table and a metadata file to specify key columns. However, this approach does not scale well for large tables as only a single replicat thread is used. To improve scalability, we can leverage Fabric’s partitioning feature and Oracle GoldenGate’s coordinated apply feature to distribute the workload for specific tables.
9.2.27.5.1 Replicat MAP Statement
MAP scott.orders, TARGET scott.orders, THREADRANGE (1-2, OID);
${catalogname}.MountedRelationalDatabase/Files/LandingZone/${schemaname}.schema/${tablename}/ogg_group_name=REP001/
${catalogname}.MountedRelationalDatabase/Files/LandingZone/${schemaname}.schema/${tablename}/ogg_group_name=REP002/
- The partition column
ogg_group_nameis populated with the respective replicat group name (group name of the coordinated worker thread). - The additional column
ogg_group_nameis visible for every user table on the Fabric Mirrored database. - Customers can configure
THREADRANGEas required.
Parent topic: Ingesting into Fabric Mirrored Tables using Partitioning
9.2.27.5.2 Metadata File Updates
THREADRANGE, the Fabric metadata file is
updated as
follows:{ "keyColumns": ["ogg_group_name", "KEY_1", "KEY_2"], "IsPartitioned": "true"
}${catalogname}.MountedRelationalDatabase/Files/LandingZone/${schemaname}.schema/${tablename}.For tables that do not require partitioning, coordinated apply or
THREADRANGE configuration is not required.
{ "keyColumns": ["KEY_1", "KEY_2"], "IsPartitioned": "false" }Parent topic: Ingesting into Fabric Mirrored Tables using Partitioning
9.2.27.5.3 Limitation
When partitioning is enabled for a table, that is, a table configured with
THREADRANGE clause, in-place primary key update operations may be
applied out of order.
To manage this behavior, use the
gg.keyupdate.threadrange.behavior parameter, which allows you to
control how Replicat handles such operations.
Parent topic: Ingesting into Fabric Mirrored Tables using Partitioning
9.2.27.6 OneLake Event Handler Troubleshooting and Diagnostics
- Unsupported Operations:
- DDL operations that
DROP/RENAMEtable will not be replicated by the Replicat process. - Renaming columns of the table is not supported by the Microsoft application consuming the Fabric Mirroring format file.
TRUNCATEoperations cannot be replicated.
- DDL operations that
- Error:
This indicates that the Azure authentication parameterscom.azure.identity.CredentialUnavailableException: EnvironmentCredential authentication unavailable.Environment variables are not fully configured.tenantId,clientId, andclientSecretare not configured. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the authentication parameterjava.lang.IllegalArgumentException: Invalid tenant id provided. You can locate your tenant id by following the instructions listed here: https://learn.microsoft.com/partner-center/find-ids-and-domain-namestenandIdis invalid. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the authentication parametercom.microsoft.aad.msal4j.MsalServiceException: AADSTS700016: Application with identifier '<invalid_clientId>' was not found in the directory '<tenant name>'.clientIdwith value<invalid_client_id>is incorrect. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the authentication parametercom.microsoft.aad.msal4j.MsalServiceException: AADSTS7000215: Invalid client secret provided.clientSecretis incorrect. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the Fabric workspace name or lakehouse name is invalid. If the Fabric workspace or lakehouse does not exist, then you should create these before starting the replicat process. Ensure the configuration parameterscom.azure.storage.file.datalake.models.DataLakeStorageException: Status code 404, "{"error":{"code":"ArtifactNotFound","message":"Request Failed with Artifact 'gglakehouse1_invalid.lakehouse' is not found in workspace 'ggworkspace1'."}}"gg.eventhandler.onelake.workspaceandgg.eventhandler.onelake.lakehouseare set to the Fabric warehouse and lakehouse names respectively. - Error:
There are one or moreONELAKE-00073 The event handler cannot proceed. The stage file '<file_name>' in the directory '<directory_name>' contains one or more truncate operations. Truncate operations cannot be replicated into Microsoft Fabric OneLake Generic Mirror. Modify the GoldenGate replicat parameter file and remove the line that contains GETTRUNCATES and restart the replicat process.TRUNCATEoperations that were processed by the replicat process. To proceed, you need to remove theGETTRUNCATESparameter from the parameter file and restart the replicat process. - Error:
You need to clear the backlog in OneLake or purge the last file with the highest sequence number and restart the replicat process.ONELAKE-00082 File name sequence number for table QASOURCE.TCUSTMER has reached the maximum limit of 99,999,999,999,999,999,999. - Error:
OneLake replication requires full images. You need to regenerate the trail files that contain full images forThe operation record in the trail sequence'<seqno>' at offset '<offset>' for the table '<table>' has missing column values.UPDATEoperations, and restart the replication process. - Mirrored Database Target:
- Error:
ONELAKE-00073 The event handler cannot proceed. The stage file '<file_name>' in the directory '<directory_name>' contains one or more truncate operations. Truncate operations cannot bereplicated into Mirrored Database in Microsoft Fabric. Modify the GoldenGate replicat parameter file and remove the line that contains GETTRUNCATES and restart the replicat process.There are one or more
TRUNCATEoperations that were processed by the replicat process.To proceed, you need to remove the
GETTRUNCATESparameter from the parameter file and restart the replicat process. - Error:
ONELAKE-00082 File name sequence number for table QASOURCE.TCUSTMER has reached the maximum limit of 99,999,999,999,999,999,999.User needs to clear the backlog in OneLake or purge the last file with the highest sequence number and restart the replicat process.
- Error:
The operation record in the trail sequence'<seqno>' at offset '<offset>' for the table '<table>' has missing column valuesReplication to Mirrored Database in Microsoft Fabric requires full images.
You need to regenerate the trail files that contain full images for UPDATE operations, and restart the replication process.
- Error:
Azure Identity => ERROR in getToken() call for scopes [https://storage.azure.com/.default]: java.io.UncheckedIOException: io.netty.channel.StacklessClosedChannelException {"az.sdk.message":"Failed to acquire a new access token.","exception":"java.io.UncheckedIOException: io.netty.channel.StacklessClosedChannelException"}The root cause for this issue is the definition order of the downloaded dependencies in
gg.classpathin the replicat properties file. The correct order is as follows:gg.classpath=$THIRD_PARTY_DIR/onelake/*:$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*
- Error:
Parent topic: Microsoft Fabric OneLake