9.2.27 Microsoft Fabric OneLake
- Lakehouse in Microsoft Fabric
- Mirrored database in Microsoft Fabric
- OneLake Event Handler Prerequisites
- OneLake Mappings to Azure Data Lake Gen2
- OneLake Event Handler Configuration
- OneLake Event Handler Primary Key Update
- OneLake Event Handler Troubleshooting and Diagnostics
Parent topic: Target
9.2.27.1 OneLake Event Handler Prerequisites
- Azure cloud account set up.
- Microsoft Fabric set up.
- Microsoft Fabric capacity along with workspace should exist.
- Microsoft Fabric Lakehouse or Mirrored database should exist for the lakehouse or mirrored database target respectively.
- Create a Microsoft Entra ID app to access the Microsoft Fabric workspace.
- App needs to be granted at least the contributor role on the workspace.
- Enable the app registration (service principal) to access Fabric APIs.
- Admin Portal -> Tenant Settings -> Service principals can use Fabric APIs -> Enabled for the entire organization
- Enable remote access to data stored in OneLake
- Admin Portal -> User can access data stored in OneLAke using Apps external to Fabric.
- Java Software Development Kit (SDK) for Azure Storage File Data Lake.
Parent topic: Microsoft Fabric OneLake
9.2.27.2 OneLake Mappings to Azure Data Lake Gen2
- Storage Account: An Azure storage account contains all of your Azure Storage data
objects: blobs, file shares, queues, tables, and disks.
- OneLake Storage Account name is always
onelake
.
- OneLake Storage Account name is always
- Container: A container organizes a set of blobs, similar to a directory in a file
system. A storage account can include an unlimited number of containers, and a container
can store an unlimited number of blobs.
- OneLake container name is mapped to OneLake workspace name.
- Endpoint: The Azure Storage service endpoint.
- OneLake default endpoint is https://onelake.dfs.fabric.microsoft.com, this can be overridden.
Parent topic: Microsoft Fabric OneLake
9.2.27.3 OneLake Event Handler Configuration
- OneLake Event Handler Automatic Configuration
- File Writer Handler Configuration
- Autoconfiguration of Parquet/ORC Event Handler
- OneLake Event Handler Configuration
- File Format for the Lakehouse target
- OneLake Event Handler Classpath Configuration
- OneLake Event Handler Authentication
- OneLake Event Handler Proxy Configuration
- Sample Configuration for Lakehouse Target
- Sample Configuration for Mirrored Database Target
Parent topic: Microsoft Fabric OneLake
9.2.27.3.1 OneLake Event Handler Automatic Configuration
OneLake replication involves configuring multiple components, such as the File Writer Handler, Avro formatter, Parquet Event Handler, ORC Event Handler, and the OneLake Event Handler. The Automatic Configuration functionality will autoconfigure these components so that the user configuration is minimal. The properties modified by auto configuration would be logged in the handler log file.
To enable autoconfiguration to replicate data to the Lakehouse target, set the
parameter gg.target=fabric_lakehouse
.
To enable autoconfiguration to replicate data to the mirrored database target,
set the parameter gg.target=fabric_mirrored_database
.
Parent topic: OneLake Event Handler Configuration
9.2.27.3.2 File Writer Handler Configuration
The File Writer Handler name is pre set based on the
gg.target
configuration. For example, if
gg.target=fabric_lakehouse
, then the File Writer Handler name is set to the
value fabric_lakehouse
and its properties are automatically set to the
required values for Lakehouse. As per this example, you can add or edit a property of the File
Writer Handler as follows:
gg.handler.fabric_lakehouse.inactivityRollInterval=1m
.
Parent topic: OneLake Event Handler Configuration
9.2.27.3.3 Autoconfiguration of Parquet/ORC Event Handler
Event Handler name is pre-set to the value parquet
or orc
based on the file format configuration.
Parent topic: OneLake Event Handler Configuration
9.2.27.3.3.1 OneLake Event Handler File Format Configuration for Parquet/ORC
- For use cases that require Parquet files such as Open Mirroring and vanilla
Parquet format, Autoconfiguration will configure the Avro formatter and chains it with a
Parquet event handler, and the OneLake event handler.
This is configured as follows:
gg.format=parquet
Note:
For the Open Mirroring target (gg.target=fabric_mirrored_database
), the file format configuration is internal and cannot be modified. - For use case that requires ORC files, Autoconfiguration will configure the
Avro formatter and chains it with the ORC event handler, and the OneLake event handler.
This is configured as follows:
gg.format=orc
.
Parent topic: Autoconfiguration of Parquet/ORC Event Handler
9.2.27.3.4 OneLake Event Handler Configuration
OneLake Event Handler name is pre set to the value onelake
.
gg.target
must be set to one of the following values:
fabric_lakehouse
: To replicate to Lakehouse in Microsoft Fabric.fabric_mirrored_database
: To replicate to Mirrored Database in Microsoft Fabric.
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.onelake.workspace |
Required | String | None | Sets the Microsoft Fabric workspace name. |
gg.eventhandler.onelake.lakehouse |
Required | String | None | Applicable only to the Lakehouse target. Sets the Microsoft Fabric lakehouse name. |
gg.eventhandler.onelake.mirror |
Required | String | None | Applicable only to the mirrored database target. Sets the mirrored database name in Fabric. |
gg.eventhandler.onelake.tenantId |
Optional | String | None | Sets the Azure tenant ID of the application. |
gg.eventhandler.onelake.clientId |
Optional | String | None | Sets the Azure client ID of the application. |
gg.eventhandler.onelake.clientSecret |
Optional | String | None | Sets the Azure client secret for the authentication. |
gg.eventhandler.onelake.pathMappingTemplate |
Optional | A string with resolvable keywords and constants used to dynamically generate the landing path for data files into OneLake. | If gg.target is set to
fabric_mirrored_database , then the default value is
${catalogname}.MountedRelationalDatabase/Files/LandingZone/${schemaname}.schema/${tablename} .
This cannot be modified. If gg.target=fabric_lakehouse , then the
default value is
${catalogname}.lakehouse/Files/ogg/${groupName}/${schemaname}.schema/${tablename} ,
this can be modified.
|
Use keywords interlaced with constants to dynamically generate a
path names at runtime. Example path name would be:
ogg/data/${fullyQualifiedTableName} . For more information about
the supported keywords see Template Keywords.
|
gg.eventhandler.onelake.fileNameMappingTemplate |
Optional | A string with resolvable keywords and constants used to dynamically generate the data file names at runtime. | If gg.format is set to
fabric_mirrored_database , then this value is set to
${custom[]} and cannot be edited. If
gg.target=fabric_lakehouse , then the default value is based on the
upstream handler, and can be modified.
|
Use keywords interlaced with constants to dynamically generate a
unique file name at runtime. Typically, file names follow the format,
${fullyQualifiedTableName}_${groupName}_${currentTimestamp}.txt .
|
gg.eventhandler.onelake.endpoint |
Optional | String | https://onelake.dfs.fabric.microsoft.com |
Sets the Fabric OneLake endpoint. |
gg.format |
Optional | parquet , orc , or one of the GG for
DAA pluggable formatter name.
|
parquet |
Applicable only to the Lakehouse target. Sets the Fabric OneLake file format. For more information, see File Format for the Lakehouse target. |
Parent topic: OneLake Event Handler Configuration
9.2.27.3.5 File Format for the Lakehouse target
The parameter gg.format
can be configured to set the file
format.
It can be set to one of the following values:
parquet
: Generate Parquet format files.orc
: Generate ORC format files.- Any other pluggable format supported by Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).
Parent topic: OneLake Event Handler Configuration
9.2.27.3.6 OneLake Event Handler Classpath Configuration
Ensure that the classpath includes the path to the following dependencies:
- Parquet Event handler dependencies including Hadoop dependencies.
- Azure Storage File DataLake Java SDK.
9.2.27.3.6.1 OneLake Event Handler Dependencies
The dependency downloader script onelake.sh
can be used to
download the OneLake dependencies. Alternatively, you can manually download the OneLake
dependencies using the following maven co-ordinates:
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
<version>12.20.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.13.1</version>
</dependency>
</dependencies>
Edit the gg.classpath
configuration parameter to include the path to the
Azure Storage File Data Lake SDK.
Parent topic: OneLake Event Handler Classpath Configuration
9.2.27.3.7 OneLake Event Handler Authentication
You can authenticate the Azure Storage device by configuring following:
tenantID
clientId
clientSecret
9.2.27.3.7.1 Azure Tenant ID, Client ID, and Client Secret
To obtain your Azure tenant ID:
- Go to the Microsoft Azure portal.
- Select Azure Active Directory from the list on the left to view the Azure Active Directory panel.
- Select Properties in the Azure Active Directory
panel to view the Azure Active Directory
properties.
The Azure tenant ID is the field marked as Directory ID.
- To obtain your Azure client ID and client
secret:
- Go to the Microsoft Azure portal.
- Select All Services from the list on the left to view the Azure Services Listing.
- Type App into the filter command box and select App Registrations from the listed services.
- Select the App Registration that
you have created to access Microsoft Fabric workspace.
The Application id displayed for the App Registration is the client ID. The client secret is the generated key string when a new key is added.
This generated key string is available only once when the key is created. If you do not know the generated key string, then create another key making sure you capture the generated key string.
Parent topic: OneLake Event Handler Authentication
9.2.27.3.8 OneLake Event Handler Proxy Configuration
jvm.bootoptions
can be
used to set proxy server configuration using
well-known Java proxy properties. For example:
jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true
Parent topic: OneLake Event Handler Configuration
9.2.27.3.9 Sample Configuration for Lakehouse Target
gg.target=fabric_lakehouse #TODO: format can be 'parquet' or 'orc' or one of the pluggable formatter types. Default is 'parquet'. #gg.format=parquet #TODO: Edit the Fabric workspace name. gg.eventhandler.onelake.workspace=<workspace-name> #TODO: Edit the Fabric lakehouse name. gg.eventhandler.onelake.lakehouse=<lakehouse-name> #TODO: Edit the tenant ID of the application. gg.eventhandler.onelake.tenantId=<azure-tenant-id> #TODO: Edit the client ID of the application. gg.eventhandler.onelake.clientId=<azure-client-id> #TODO: Edit the client secret for the authentication. gg.eventhandler.onelake.clientSecret=<azure-client-secret> #TODO: Edit the classpath to include Hadoop, Parquet, and Azure DataLake SDK dependencies. gg.classpath=$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*:$THIRD_PARTY_DIR/onelake/* #TODO: Edit the proxy configuration. #jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true
Parent topic: OneLake Event Handler Configuration
9.2.27.3.10 Sample Configuration for Mirrored Database Target
gg.target=fabric_mirrored_database #TODO: Edit the Fabric workspace name. gg.eventhandler.onelake.workspace=<workspace-name> #TODO: Edit the Fabric mirror Database name. gg.eventhandler.onelake.mirror=<mirror-name> #TODO: Edit the tenant ID of the application. gg.eventhandler.onelake.tenantId=<azure-tenant-id> #TODO: Edit the client ID of the application. gg.eventhandler.onelake.clientId=<azure-client-id> #TODO: Edit the client secret for the authentication. gg.eventhandler.onelake.clientSecret=<azure-client-secret> #TODO: Edit the classpath to include Hadoop, Parquet, and Azure DataLake SDK dependencies. gg.classpath=$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*:$THIRD_PARTY_DIR/onelake/* #TODO: Edit the proxy configuration. #jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true
Parent topic: OneLake Event Handler Configuration
9.2.27.4 OneLake Event Handler Primary Key Update
Primary key UPDATE
behavior depends
on the file-format configuration.
Parent topic: Microsoft Fabric OneLake
9.2.27.4.1 Mirrored Database in Microsoft Fabric
When file format is set to
gg.format=fabric_mirroring
, then
primary key UPDATE
operations will
be split into a DELETE
operation
followed by an INSERT
operation.
This behavior cannot be modified.
Parent topic: OneLake Event Handler Primary Key Update
9.2.27.4.2 Lakehouse in Microsoft Fabric
If gg.target=fabric_lakehouse
is set, then by default primary key
UPDATE
operations will result in a Replicat ABEND.
This behavior can be modified by configuration of the formatter property
gg.handler.onelake.format.pkUpdateHandling
The property gg.handler.onelake.format.pkUpdateHandling
can
accept one of the following input:
abend
: ABEND replicat when a primary keyUPDATE
is processed.update
: Replicat processes primary keyUPDATE
as a regularUPDATE
.delete-insert
: Replicat would split primary keyUPDATE
into aDELETE
operation followed by anINSERT
operation.
Parent topic: OneLake Event Handler Primary Key Update
9.2.27.5 OneLake Event Handler Troubleshooting and Diagnostics
- Unsupported Operations:
- DDL operations that
DROP/RENAME
table will not be replicated by the Replicat process. - Renaming columns of the table is not supported by the Microsoft application consuming the Fabric Mirroring format file.
TRUNCATE
operations cannot be replicated.
- DDL operations that
- Error:
This indicates that the Azure authentication parameterscom.azure.identity.CredentialUnavailableException: EnvironmentCredential authentication unavailable.Environment variables are not fully configured.
tenantId
,clientId
, andclientSecret
are not configured. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the authentication parameterjava.lang.IllegalArgumentException: Invalid tenant id provided. You can locate your tenant id by following the instructions listed here: https://learn.microsoft.com/partner-center/find-ids-and-domain-names
tenandId
is invalid. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the authentication parametercom.microsoft.aad.msal4j.MsalServiceException: AADSTS700016: Application with identifier '<invalid_clientId>' was not found in the directory '<tenant name>'.
clientId
with value<invalid_client_id>
is incorrect. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the authentication parametercom.microsoft.aad.msal4j.MsalServiceException: AADSTS7000215: Invalid client secret provided.
clientSecret
is incorrect. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters. - Error:
This indicates that the Fabric workspace name or lakehouse name is invalid. If the Fabric workspace or lakehouse does not exist, then you should create these before starting the replicat process. Ensure the configuration parameterscom.azure.storage.file.datalake.models.DataLakeStorageException: Status code 404, "{"error":{"code":"ArtifactNotFound","message":"Request Failed with Artifact 'gglakehouse1_invalid.lakehouse' is not found in workspace 'ggworkspace1'."}}"
gg.eventhandler.onelake.workspace
andgg.eventhandler.onelake.lakehouse
are set to the Fabric warehouse and lakehouse names respectively. - Error:
There are one or moreONELAKE-00073 The event handler cannot proceed. The stage file '<file_name>' in the directory '<directory_name>' contains one or more truncate operations. Truncate operations cannot be replicated into Microsoft Fabric OneLake Generic Mirror. Modify the GoldenGate replicat parameter file and remove the line that contains GETTRUNCATES and restart the replicat process.
TRUNCATE
operations that were processed by the replicat process. To proceed, you need to remove theGETTRUNCATES
parameter from the parameter file and restart the replicat process. - Error:
You need to clear the backlog in OneLake or purge the last file with the highest sequence number and restart the replicat process.ONELAKE-00082 File name sequence number for table QASOURCE.TCUSTMER has reached the maximum limit of 99,999,999,999,999,999,999.
- Error:
OneLake replication requires full images. You need to regenerate the trail files that contain full images forThe operation record in the trail sequence'<seqno>' at offset '<offset>' for the table '<table>' has missing column values.
UPDATE
operations, and restart the replication process. - Mirrored Database Target:
- Error:
ONELAKE-00073 The event handler cannot proceed. The stage file '<file_name>' in the directory '<directory_name>' contains one or more truncate operations. Truncate operations cannot bereplicated into Mirrored Database in Microsoft Fabric. Modify the GoldenGate replicat parameter file and remove the line that contains GETTRUNCATES and restart the replicat process.
There are one or more
TRUNCATE
operations that were processed by the replicat process.To proceed, you need to remove the
GETTRUNCATES
parameter from the parameter file and restart the replicat process. - Error:
ONELAKE-00082 File name sequence number for table QASOURCE.TCUSTMER has reached the maximum limit of 99,999,999,999,999,999,999.
User needs to clear the backlog in OneLake or purge the last file with the highest sequence number and restart the replicat process.
- Error:
The operation record in the trail sequence'<seqno>' at offset '<offset>' for the table '<table>' has missing column values
Replication to Mirrored Database in Microsoft Fabric requires full images.
You need to regenerate the trail files that contain full images for UPDATE operations, and restart the replication process.
- Error:
Parent topic: Microsoft Fabric OneLake