9.2.27 Microsoft Fabric OneLake

Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. OneLake is built into the Fabric platform and provides a unified location to store all organizational data where the workloads operate. See https://learn.microsoft.com/en-us/fabric/onelake/. You can use the OneLake Event handler to load files that contain operations records into the following targets:
  • Lakehouse in Microsoft Fabric
  • Mirrored database in Microsoft Fabric
This topic contains the following:

9.2.27.1 OneLake Event Handler Prerequisites

  • Azure cloud account set up.
  • Microsoft Fabric set up.
    • Microsoft Fabric capacity along with workspace should exist.
    • Microsoft Fabric Lakehouse or Mirrored database should exist for the lakehouse or mirrored database target respectively.
    • Create a Microsoft Entra ID app to access the Microsoft Fabric workspace.
    • App needs to be granted at least the contributor role on the workspace.
    • Enable the app registration (service principal) to access Fabric APIs.
      • Admin Portal -> Tenant Settings -> Service principals can use Fabric APIs -> Enabled for the entire organization
    • Enable remote access to data stored in OneLake
      • Admin Portal -> User can access data stored in OneLAke using Apps external to Fabric.
  • Java Software Development Kit (SDK) for Azure Storage File Data Lake.

9.2.27.2 OneLake Mappings to Azure Data Lake Gen2

  • Storage Account: An Azure storage account contains all of your Azure Storage data objects: blobs, file shares, queues, tables, and disks.
    • OneLake Storage Account name is always onelake.
  • Container: A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs.
    • OneLake container name is mapped to OneLake workspace name.
  • Endpoint: The Azure Storage service endpoint.

9.2.27.3 OneLake Event Handler Configuration

9.2.27.3.1 OneLake Event Handler Automatic Configuration

OneLake replication involves configuring multiple components, such as the File Writer Handler, Avro formatter, Parquet Event Handler, ORC Event Handler, and the OneLake Event Handler. The Automatic Configuration functionality will autoconfigure these components so that the user configuration is minimal. The properties modified by auto configuration would be logged in the handler log file.

To enable autoconfiguration to replicate data to the Lakehouse target, set the parameter gg.target=fabric_lakehouse.

To enable autoconfiguration to replicate data to the mirrored database target, set the parameter gg.target=fabric_mirrored_database.

9.2.27.3.2 File Writer Handler Configuration

The File Writer Handler name is pre set based on the gg.target configuration. For example, if gg.target=fabric_lakehouse, then the File Writer Handler name is set to the value fabric_lakehouse and its properties are automatically set to the required values for Lakehouse. As per this example, you can add or edit a property of the File Writer Handler as follows: gg.handler.fabric_lakehouse.inactivityRollInterval=1m.

9.2.27.3.3 Autoconfiguration of Parquet/ORC Event Handler

Event Handler name is pre-set to the value parquet or orc based on the file format configuration.

9.2.27.3.3.1 OneLake Event Handler File Format Configuration for Parquet/ORC

  • For use cases that require Parquet files such as Open Mirroring and vanilla Parquet format, Autoconfiguration will configure the Avro formatter and chains it with a Parquet event handler, and the OneLake event handler.
    This is configured as follows: gg.format=parquet

    Note:

    For the Open Mirroring target (gg.target=fabric_mirrored_database), the file format configuration is internal and cannot be modified.
  • For use case that requires ORC files, Autoconfiguration will configure the Avro formatter and chains it with the ORC event handler, and the OneLake event handler. This is configured as follows: gg.format=orc.

9.2.27.3.4 OneLake Event Handler Configuration

OneLake Event Handler name is pre set to the value onelake.

gg.target must be set to one of the following values:

  • fabric_lakehouse: To replicate to Lakehouse in Microsoft Fabric.
  • fabric_mirrored_database: To replicate to Mirrored Database in Microsoft Fabric.
Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.onelake.workspace Required String None Sets the Microsoft Fabric workspace name.
gg.eventhandler.onelake.lakehouse Required String None Applicable only to the Lakehouse target. Sets the Microsoft Fabric lakehouse name.
gg.eventhandler.onelake.mirror Required String None Applicable only to the mirrored database target. Sets the mirrored database name in Fabric.
gg.eventhandler.onelake.tenantId Optional String None Sets the Azure tenant ID of the application.
gg.eventhandler.onelake.clientId Optional String None Sets the Azure client ID of the application.
gg.eventhandler.onelake.clientSecret Optional String None Sets the Azure client secret for the authentication.
gg.eventhandler.onelake.pathMappingTemplate Optional A string with resolvable keywords and constants used to dynamically generate the landing path for data files into OneLake. If gg.target is set to fabric_mirrored_database, then the default value is ${catalogname}.MountedRelationalDatabase/Files/LandingZone/${schemaname}.schema/${tablename}. This cannot be modified. If gg.target=fabric_lakehouse, then the default value is ${catalogname}.lakehouse/Files/ogg/${groupName}/${schemaname}.schema/${tablename}, this can be modified. Use keywords interlaced with constants to dynamically generate a path names at runtime. Example path name would be: ogg/data/${fullyQualifiedTableName}. For more information about the supported keywords see Template Keywords.
gg.eventhandler.onelake.fileNameMappingTemplate Optional A string with resolvable keywords and constants used to dynamically generate the data file names at runtime. If gg.format is set to fabric_mirrored_database, then this value is set to ${custom[]} and cannot be edited. If gg.target=fabric_lakehouse, then the default value is based on the upstream handler, and can be modified. Use keywords interlaced with constants to dynamically generate a unique file name at runtime. Typically, file names follow the format, ${fullyQualifiedTableName}_${groupName}_${currentTimestamp}.txt.
gg.eventhandler.onelake.endpoint Optional String https://onelake.dfs.fabric.microsoft.com Sets the Fabric OneLake endpoint.
gg.format Optional parquet, orc, or one of the GG for DAA pluggable formatter name. parquet Applicable only to the Lakehouse target. Sets the Fabric OneLake file format. For more information, see File Format for the Lakehouse target.

9.2.27.3.5 File Format for the Lakehouse target

The parameter gg.format can be configured to set the file format.

It can be set to one of the following values:

  • parquet: Generate Parquet format files.
  • orc: Generate ORC format files.
  • Any other pluggable format supported by Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).

9.2.27.3.6 OneLake Event Handler Classpath Configuration

Ensure that the classpath includes the path to the following dependencies:

  • Parquet Event handler dependencies including Hadoop dependencies.
  • Azure Storage File DataLake Java SDK.
9.2.27.3.6.1 OneLake Event Handler Dependencies

The dependency downloader script onelake.sh can be used to download the OneLake dependencies. Alternatively, you can manually download the OneLake dependencies using the following maven co-ordinates:

<dependencies>
    <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-storage-file-datalake</artifactId>
        <version>12.20.0</version>
    </dependency>
    <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-identity</artifactId>
        <version>1.13.1</version>
    </dependency>
</dependencies>

Edit the gg.classpath configuration parameter to include the path to the Azure Storage File Data Lake SDK.

9.2.27.3.7 OneLake Event Handler Authentication

You can authenticate the Azure Storage device by configuring following:

  • tenantID
  • clientId
  • clientSecret
9.2.27.3.7.1 Azure Tenant ID, Client ID, and Client Secret

To obtain your Azure tenant ID:

  • Go to the Microsoft Azure portal.
  • Select Azure Active Directory from the list on the left to view the Azure Active Directory panel.
  • Select Properties in the Azure Active Directory panel to view the Azure Active Directory properties.

    The Azure tenant ID is the field marked as Directory ID.

  • To obtain your Azure client ID and client secret:
    • Go to the Microsoft Azure portal.
    • Select All Services from the list on the left to view the Azure Services Listing.
    • Type App into the filter command box and select App Registrations from the listed services.
    • Select the App Registration that you have created to access Microsoft Fabric workspace.

      The Application id displayed for the App Registration is the client ID. The client secret is the generated key string when a new key is added.

      This generated key string is available only once when the key is created. If you do not know the generated key string, then create another key making sure you capture the generated key string.

9.2.27.3.8 OneLake Event Handler Proxy Configuration

When the process is run behind a proxy server, the property jvm.bootoptions can be used to set proxy server configuration using well-known Java proxy properties. For example:
jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true

9.2.27.3.9 Sample Configuration for Lakehouse Target

gg.target=fabric_lakehouse
#TODO: format can be 'parquet' or 'orc' or one of the pluggable formatter types. Default is 'parquet'.
#gg.format=parquet
#TODO: Edit the Fabric workspace name.
gg.eventhandler.onelake.workspace=<workspace-name>
#TODO: Edit the Fabric lakehouse name.
gg.eventhandler.onelake.lakehouse=<lakehouse-name>
#TODO: Edit the tenant ID of the application.
gg.eventhandler.onelake.tenantId=<azure-tenant-id>
#TODO: Edit the client ID of the application.
gg.eventhandler.onelake.clientId=<azure-client-id>
#TODO: Edit the client secret for the authentication.
gg.eventhandler.onelake.clientSecret=<azure-client-secret>
#TODO: Edit the classpath to include Hadoop, Parquet, and Azure DataLake SDK dependencies.
gg.classpath=$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*:$THIRD_PARTY_DIR/onelake/*
#TODO: Edit the proxy configuration.
#jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true

9.2.27.3.10 Sample Configuration for Mirrored Database Target

gg.target=fabric_mirrored_database
#TODO: Edit the Fabric workspace name.
gg.eventhandler.onelake.workspace=<workspace-name>
#TODO: Edit the Fabric mirror Database name.
gg.eventhandler.onelake.mirror=<mirror-name>
#TODO: Edit the tenant ID of the application.
gg.eventhandler.onelake.tenantId=<azure-tenant-id>
#TODO: Edit the client ID of the application.
gg.eventhandler.onelake.clientId=<azure-client-id>
#TODO: Edit the client secret for the authentication.
gg.eventhandler.onelake.clientSecret=<azure-client-secret>
#TODO: Edit the classpath to include Hadoop, Parquet, and Azure DataLake SDK dependencies.
gg.classpath=$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*:$THIRD_PARTY_DIR/onelake/*
#TODO: Edit the proxy configuration.
#jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80 -Djava.net.useSystemProxies=true

9.2.27.4 OneLake Event Handler Primary Key Update

Primary key UPDATE behavior depends on the file-format configuration.

9.2.27.4.1 Mirrored Database in Microsoft Fabric

When file format is set to gg.format=fabric_mirroring, then primary key UPDATE operations will be split into a DELETE operation followed by an INSERT operation. This behavior cannot be modified.

9.2.27.4.2 Lakehouse in Microsoft Fabric

If gg.target=fabric_lakehouse is set, then by default primary key UPDATE operations will result in a Replicat ABEND.

This behavior can be modified by configuration of the formatter property gg.handler.onelake.format.pkUpdateHandling

The property gg.handler.onelake.format.pkUpdateHandling can accept one of the following input:

  • abend: ABEND replicat when a primary key UPDATE is processed.
  • update: Replicat processes primary key UPDATE as a regular UPDATE.
  • delete-insert: Replicat would split primary key UPDATE into a DELETE operation followed by an INSERT operation.

9.2.27.5 OneLake Event Handler Troubleshooting and Diagnostics

  • Unsupported Operations:
    • DDL operations that DROP/RENAME table will not be replicated by the Replicat process.
    • Renaming columns of the table is not supported by the Microsoft application consuming the Fabric Mirroring format file.
    • TRUNCATE operations cannot be replicated.
  • Error:
    com.azure.identity.CredentialUnavailableException: EnvironmentCredential authentication unavailable.Environment variables are not fully configured.
    This indicates that the Azure authentication parameters tenantId, clientId, and clientSecret are not configured. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters.
  • Error:
    java.lang.IllegalArgumentException: Invalid tenant id provided. You can locate your tenant id by following the instructions  listed here:
    https://learn.microsoft.com/partner-center/find-ids-and-domain-names
    This indicates that the authentication parameter tenandId is invalid. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters.
  • Error:
    com.microsoft.aad.msal4j.MsalServiceException: AADSTS700016: Application with identifier '<invalid_clientId>' was  not found in the directory '<tenant
     name>'.
    This indicates that the authentication parameter clientId with value <invalid_client_id> is incorrect. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters.
  • Error:
    com.microsoft.aad.msal4j.MsalServiceException: AADSTS7000215: Invalid client secret provided.
    This indicates that the authentication parameter clientSecret is incorrect. See Azure Tenant ID, Client ID, and Client Secret to configure the authentication parameters.
  • Error:
    com.azure.storage.file.datalake.models.DataLakeStorageException:  Status code 404,
    "{"error":{"code":"ArtifactNotFound","message":"Request Failed with  Artifact 'gglakehouse1_invalid.lakehouse' is not found in workspace
    'ggworkspace1'."}}"
    This indicates that the Fabric workspace name or lakehouse name is invalid. If the Fabric workspace or lakehouse does not exist, then you should create these before starting the replicat process. Ensure the configuration parameters gg.eventhandler.onelake.workspace and gg.eventhandler.onelake.lakehouse are set to the Fabric warehouse and lakehouse names respectively.
  • Error:
    ONELAKE-00073 The event handler cannot proceed. The stage  file '<file_name>' in the directory '<directory_name>'  contains one or more truncate operations.
    Truncate operations cannot be  replicated into Microsoft Fabric OneLake Generic Mirror. Modify the  GoldenGate replicat parameter file and remove the line that contains  GETTRUNCATES and
    restart the replicat process.
    There are one or more TRUNCATE operations that were processed by the replicat process. To proceed, you need to remove the GETTRUNCATES parameter from the parameter file and restart the replicat process.
  • Error:
    ONELAKE-00082 File name sequence number for table QASOURCE.TCUSTMER has reached the maximum limit of 99,999,999,999,999,999,999.
    You need to clear the backlog in OneLake or purge the last file with the highest sequence number and restart the replicat process.
  • Error:
    The operation record in the trail sequence'<seqno>' at offset '<offset>' for the table '<table>' has missing column values.
    OneLake replication requires full images. You need to regenerate the trail files that contain full images for UPDATE operations, and restart the replication process.
  • Mirrored Database Target:
    • Error:
      ONELAKE-00073 The event handler cannot proceed. The stage file '<file_name>' in  the directory '<directory_name>' contains one or more truncate  operations. Truncate operations cannot bereplicated into Mirrored  Database in Microsoft Fabric. Modify the GoldenGate replicat
      parameter  file and remove the line that contains GETTRUNCATES and restart the  replicat process.

      There are one or more TRUNCATE operations that were processed by the replicat process.

      To proceed, you need to remove the GETTRUNCATES parameter from the parameter file and restart the replicat process.

    • Error:
      ONELAKE-00082 File name sequence number for table QASOURCE.TCUSTMER has reached the maximum limit of
      99,999,999,999,999,999,999.

      User needs to clear the backlog in OneLake or purge the last file with the highest sequence number and restart the replicat process.

    • Error:
      The operation record in the trail sequence'<seqno>' at offset  '<offset>' for the table '<table>'
      has missing column values

      Replication to Mirrored Database in Microsoft Fabric requires full images.

      You need to regenerate the trail files that contain full images for UPDATE operations, and restart the replication process.