9.2.35 Additional Details
- HDFS Event Handler
The HDFS Event Handler is used to load files generated by the File Writer Handler into HDFS. - Metacolumn Keywords
- Metadata Providers
The Metadata Providers can replicate from a source to a target using a Replicat parameter file. - Pluggable Formatters
The pluggable formatters are used to convert operations from the Oracle GoldenGate trail file into formatted messages that you can send to Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) targets using one of the GG for DAA handlers. - Stage and Merge Data Warehouse Replication
Data warehouse targets typically support Massively Parallel Processing (MPP). The cost of a single Data Manipulation Language (DML) operation is comparable to the cost of execution of batch DMLs. - Template Keywords
- Velocity Dependencies
Starting Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release 21.1.0.0.0, the Velocity jar files have been removed from the packaging.
Parent topic: Target
9.2.35.1 HDFS Event Handler
The HDFS Event Handler is used to load files generated by the File Writer Handler into HDFS.
This topic describes how to use the HDFS Event Handler. See Flat Files.
Parent topic: Additional Details
9.2.35.1.1 Detailing the Functionality
9.2.35.1.1.1 Configuring the Handler
The HDFS Event Handler can can upload data files to HDFS. These additional configuration steps are required:
The HDFS Event Handler dependencies and considerations are the same as the HDFS Handler, see HDFS Additional Considerations.
Ensure that gg.classpath
includes the HDFS client libraries.
Ensure that the directory containing the HDFS core-site.xml
file is in gg.classpath
. This is so the core-site.xml
file can be read at runtime and the connectivity information to HDFS can be resolved. For example:
gg.classpath=/{HDFSinstallDirectory}/etc/hadoop
If Kerberos authentication is enabled on the HDFS cluster, you have to configure the Kerberos principal and the location of the keytab
file so that the password can be resolved at runtime:
gg.eventHandler.name.kerberosPrincipal=principal
gg.eventHandler.name.kerberosKeytabFile=pathToTheKeytabFile
Parent topic: Detailing the Functionality
9.2.35.1.1.2 Configuring the HDFS Event Handler
You configure the HDFS Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the HDFS Event Handler, you must first configure the
handler type by specifying gg.eventhandler.name.type=hdfs
and the other HDFS Event properties as follows:
Table 9-42 HDFS Event Handler Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
|
None |
Selects the HDFS Event Handler for use. |
|
Required |
A string with resolvable keywords and constants used to dynamically generate the path in HDFS to write data files. |
None |
Use keywords interlaced with constants to dynamically generate unique path names at
runtime. Path names typically follow the format,
|
|
Optional |
A string with resolvable keywords and constants used to dynamically generate the HDFS file name at runtime. |
None |
Use keywords interlaced with constants to dynamically generate unique file names at runtime. If not set, the upstream file name is used. See Template Keywords. |
|
Optional |
|
|
Indicates what the File Writer Handler should do at the finalize action.
|
|
Optional |
The Kerberos principal name. |
None |
Set to the Kerberos principal when HDFS Kerberos authentication is enabled. |
|
Optional |
The path to the Keberos |
None |
Set to the path to the Kerberos |
|
Optional |
A unique string identifier cross referencing a child event handler. |
No event handler configured. |
A unique string identifier cross referencing an event handler. The event handler will be invoked on the file roll event. Event handlers can do thing file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS. |
Parent topic: Detailing the Functionality
9.2.35.2 Metacolumn Keywords
The metacolumns functionality allows you to select the metadata fields that you want to see in the generated output messages. The format of the metacolumn syntax is:
-
${keyword[fieldName].argument}
-
The keyword is fixed based on the metacolumn syntax. Optionally, you can provide a field name between the square brackets. If a field name is not provided, then the default field name is used.
Keywords are separated by a comma. Following is an example configuration of metacolumns:
gg.handler.filewriter.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
An argument may be required for a few metacolumn keywords. For example, it is required where specific token values are resolved or specific environmental variable values are resolved.
-
${alltokens}
-
All of the tokens for an operation delivered as a map where the token keys are the keys in the map and the token values are the map values.
-
${token}
-
The value of a specific Oracle GoldenGate token. The token key should follow token key should follow the token using the period (
.
) operator. For example:${token.MYTOKEN}
-
${sys}
-
A system environmental variable. The variable name should follow sys using the period (.) operator.
-
${sys.MYVAR}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.) operator. -
${env}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.
) operator. For example:${env.someVariable}
-
${javaprop}
-
A Java JVM variable. The variable name should follow
javaprop
using the period (.
) operator. For example:${javaprop.MYVAR}
-
${optype}
-
The operation type. This is generally
I
for inserts,U
for updates,D
for deletes, andT
for truncates. -
${position}
-
The record position. This is location of the record in the source trail file. It is a 20 character string. The first 10 characters is the trail file sequence number. The last 10 characters is the offset or rba of the record in the trail file.
-
${timestamp}
-
Record timestamp.
-
${catalog}
-
Catalog name.
-
${schema}
-
Schema name.
-
${table}
-
Table name.
-
${objectname}
-
The fully qualified table name.
-
${csn}
-
Source Commit Sequence Number.
-
${xid}
-
Source transaction ID.
-
${currenttimestamp}
-
Current timestamp.
-
${currenttimestampiso8601}
-
Current timestamp in ISO 8601 format.
-
${opseqno}
-
Record sequence number within the transaction.
-
${timestampmicro}
-
Record timestamp in microseconds after epoch.
-
${currenttimestampmicro}
-
Current timestamp in microseconds after epoch.
-
${txind}
-
The is the transactional indicator from the source trail file. The values of a transaction are
B
for the first operation,M
for the middle operations,E
for the last operation, orW
for whole if there is only one operation. Filtering operations or the use of coordinated apply negate the usefulness of this field. -
${primarykeycolumns}
-
Use to inject a field with a list of the primary key column names.
- ${primarykeys}
-
Use to inject a field with a list of the primary key column values with underscore (
_
) delimiter between primary key values.Usage:
${primarykeys[fieldName]}
Example:
${primarykeys[JMSXGroupID]}
-
${static}
-
Use to inject a field with a static value into the output. The value desired should be the argument. If the desired value is
abc
, then the syntax is${static.abc}
or${static[FieldName].abc}
. -
${seqno}
-
Used to inject a field containing the sequence number of the source trail file for the given operation.
-
${rba}
-
Used to inject a field containing the rba (offset) of the operation in the source trail file for the given operation.
-
${metadatachanged}
-
A boolean field which gets set to true on the first operation following a metadata change for the source table definition.
Parent topic: Additional Details
9.2.35.3 Metadata Providers
The Metadata Providers can replicate from a source to a target using a Replicat parameter file.
This chapter describes how to use the Metadata Providers.
- About the Metadata Providers
- Avro Metadata Provider
The Avro Metadata Provider is used to retrieve the table metadata from Avro Schema files. For every table mapped in Replicat usingCOLMAP
, the metadata is retrieved from Avro Schema. Retrieved metadata is then used by Replicat for column mapping. - Cassandra Metadata Provider
The Cassandra metadata provider is used to retrieve the table metadata from the Cassandra instance. The metadata is retrieved from Cassandra for every target table that is mapped in the replicat properties file using theCOLMAP
parameter. The keyspace and tables should already be created on the target for Cassandra MDP to fetch the metadata. - Java Database Connectivity Metadata Provider
The Java Database Connectivity (JDBC) Metadata Provider is used to retrieve the table metadata from any target database that supports a JDBC connection and has a database schema. It is the preferred metadata provider for any target RDBMS database, although various other non-RDBMS targets also provide a JDBC driver. - Hive Metadata Provider
The Hive Metadata Provider is used to retrieve the table metadata from a Hive metastore. The metadata is retrieved from Hive for every target table that is mapped in the Replicat properties file using theCOLMAP
parameter. The retrieved target metadata is used by Replicat for the column mapping functionality. - Google BigQuery Metadata Provider
Google metadata provider uses the Google Query Job to retrieve the metadata schema information from the Google BigQuery Table. The Table should already be created on the target for BigQuery to fetch the metadata.
Parent topic: Additional Details
9.2.35.3.1 About the Metadata Providers
Metadata Providers work only if handlers are configured to run with a Replicat process.
The Replicat process maps source table to target table and source column to target column mapping using syntax in the Replicat configuration file. The source metadata definitions are included in the Oracle GoldenGate trail file (or by source definitions files in Oracle GoldenGate releases 12.2 and later). When the replication target is a database, the Replicat process obtains the target metadata definitions from the target database. However, this is a shortcoming when pushing data to Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications or during Java delivery in general. Typically, GG for DAA applications provide no target metadata, so Replicat mapping is not possible. The metadata providers exist to address this deficiency. You can use a metadata provider to define target metadata using either Avro or Hive, which enables Replicat mapping of source table to target table and source column to target column.
The use of the metadata provider is optional and is enabled if the gg.mdp.type
property is specified in the Java Adapter Properties file. If the metadata included in the source Oracle GoldenGate trail file is acceptable for output, then do not use the metadata provider. Use a metadata provider should be used in the following cases:
-
You need to map source table names into target table names that do not match.
-
You need to map source column names into target column name that do not match.
-
You need to include certain columns from the source trail file and omit other columns.
A limitation of Replicat mapping is that the mapping defined in the Replicat configuration file is static. Oracle GoldenGate provides functionality for DDL propagation when using an Oracle database as the source. The proper handling of schema evolution can be problematic when the Metadata Provider and Replicat mapping are used. Consider your use cases for schema evolution and plan for how you want to update the Metadata Provider and the Replicat mapping syntax for required changes.
For every table mapped in Replicat using COLMAP
, the metadata is retrieved from a configured metadata provider and retrieved metadata then be used by Replicat for column mapping.
Only the Hive and Avro Metadata Providers are supported and you must choose one or the other to use in your metadata provider implementation.
Scenarios - When to use a metadata provider
-
The following scenarios do not require a metadata provider to be configured:
A mapping in which the source schema named
GG
is mapped to the target schema namedGGADP
.*A mapping in which the schema and table name whereby the schema
GG.TCUSTMER
is mapped to the table nameGGADP.TCUSTMER_NEW
MAP GG.*, TARGET GGADP.*; (OR) MAP GG.TCUSTMER, TARGET GG_ADP.TCUSTMER_NEW;
-
The following scenario requires a metadata provider to be configured:
A mapping in which the source column name does not match the target column name. For example, a source column of
CUST_CODE
mapped to a target column ofCUST_CODE_NEW
.MAP GG.TCUSTMER, TARGET GG_ADP.TCUSTMER_NEW, COLMAP(USEDEFAULTS, CUST_CODE_NEW=CUST_CODE, CITY2=CITY);
Parent topic: Metadata Providers
9.2.35.3.2 Avro Metadata Provider
The Avro Metadata Provider is used to retrieve the table metadata from Avro Schema
files. For every table mapped in Replicat using COLMAP
, the metadata is
retrieved from Avro Schema. Retrieved metadata is then used by Replicat for column
mapping.
- Detailed Functionality
- Runtime Prerequisites
- Classpath Configuration
- Avro Metadata Provider Configuration
- Review a Sample Configuration
- Metadata Change Events
- Limitations
- Troubleshooting
Parent topic: Metadata Providers
9.2.35.3.2.1 Detailed Functionality
The Avro Metadata Provider uses Avro schema definition files to retrieve metadata.
Avro schemas are defined using JSON. For each table mapped in the
process_name. prm
file, you must create a corresponding Avro
schema definition file.
Avro Metadata Provider Schema Definition Syntax
{"namespace": "[$catalogname.]$schemaname", "type": "record", "name": "$tablename", "fields": [ {"name": "$col1", "type": "$datatype"}, {"name": "$col2 ", "type": "$datatype ", "primary_key":true}, {"name": "$col3", "type": "$datatype ", "primary_key":true}, {"name": "$col4", "type": ["$datatype","null"]} ] } namespace - name of catalog/schema being mapped name - name of the table being mapped fields.name - array of column names fields.type - datatype of the column fields.primary_key - indicates the column is part of primary key. Representing nullable and not nullable columns: "type":"$datatype" - indicates the column is not nullable, where "$datatype" is the actual datatype. "type": ["$datatype","null"] - indicates the column is nullable, where "$datatype" is the actual datatype
The names of schema files that are accessed by the Avro Metadata Provider must be in the following format:
[$catalogname.]$schemaname.$tablename.mdp.avsc $catalogname - name of the catalog if exists $schemaname - name of the schema $tablename - name of the table .mdp.avsc - constant, which should be appended always
Supported Avro Primitive Data Types
- boolean
- bytes
- double
- float
- int
- long
- string
See https://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
.
Supported Avro Logical Data Types
- decimal
- timestamp
{"name":"DECIMALFIELD","type": {"type":"bytes","logicalType":"decimal","precision":15,"scale":5}}
{"name":"TIMESTAMPFIELD","type": {"type":"long","logicalType":"timestamp-micros"}}
Parent topic: Avro Metadata Provider
9.2.35.3.2.2 Runtime Prerequisites
Before you start the Replicat process, create Avro schema definitions for all tables mapped in Replicat's parameter file.
Parent topic: Avro Metadata Provider
9.2.35.3.2.3 Classpath Configuration
The Avro Metadata Provider requires no additional classpath setting.
Parent topic: Avro Metadata Provider
9.2.35.3.2.4 Avro Metadata Provider Configuration
Property | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
|
|
Selects the Avro Metadata Provider |
|
Required |
Example: |
|
The path to the Avro schema files directory |
|
Optional |
Valid character set |
|
Specifies the character set of the column with character data type. Used to convert the source data from the trail file to the correct target character set. |
|
Optional |
Valid character set |
|
Specifies the character set of the column with character data type. Used to convert the source data from the trail file to the correct target character set. Example: Used to indicate character set of columns, such as |
Parent topic: Avro Metadata Provider
9.2.35.3.2.5 Review a Sample Configuration
This is an example for configuring the Avro Metadata Provider. Consider a source that includes the following table:
TABLE GG.TCUSTMER { CUST_CODE VARCHAR(4) PRIMARY KEY, NAME VARCHAR(100), CITY VARCHAR(200), STATE VARCHAR(200) }
This table maps the(CUST_CODE (GG.TCUSTMER)
in the source to
CUST_CODE2 (GG_AVRO.TCUSTMER_AVRO)
on the target and the column
CITY
(GG.TCUSTMER)
in source to CITY2
(GG_AVRO.TCUSTMER_AVRO)
on the target. Therefore, the mapping in the
process_name. prm
file is:
MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY);
In this example the mapping definition is as follows:
-
Source schema
GG
is mapped to target schemaGG_AVRO
. -
Source column
CUST_CODE
is mapped to target columnCUST_CODE2
. -
Source column
CITY
is mapped to target columnCITY2
. -
USEDEFAULTS
specifies that rest of the columns names are same on both source and target (NAME
andSTATE
columns).
This example uses the following Avro schema definition file:
File path: /home/ggadp/avromdpGG_AVRO.TCUSTMER_AVRO.mdp.avsc
{"namespace": "GG_AVRO", "type": "record", "name": "TCUSTMER_AVRO", "fields": [ {"name": "NAME", "type": "string"}, {"name": "CUST_CODE2", "type": "string", "primary_key":true}, {"name": "CITY2", "type": "string"}, {"name": "STATE", "type": ["string","null"]} ] }
The configuration in the Java Adapter properties file includes the following:
gg.mdp.type = avro gg.mdp.schemaFilesPath = /home/ggadp/avromdp
The following sample output uses a delimited text formatter with a semi-colon as the delimiter:
I;GG_AVRO.TCUSTMER_AVRO;2013-06-02 22:14:36.000000;NAME;BG SOFTWARE CO;CUST_CODE2;WILL;CITY2;SEATTLE;STATE;WA
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) includes a sample Replicat configuration file, a sample Java Adapter properties file, and sample Avro schemas at the following location:
GoldenGate_install_directory
/AdapterExamples/big-data/metadata_provider/avro
Parent topic: Avro Metadata Provider
9.2.35.3.2.6 Metadata Change Events
If the DDL changes in the source database tables, you may need to modify the Avro schema definitions and the mappings in the Replicat configuration file. You may also want to stop or suspend the Replicat process in the case of a metadata change event. You can stop the Replicat process by adding the following line to the Replicat configuration file (process_name. prm
):
DDL INCLUDE ALL, EVENTACTIONS (ABORT)
Alternatively, you can suspend the Replicat process by adding the following line to the Replication configuration file:
DDL INCLUDE ALL, EVENTACTIONS (SUSPEND)
Parent topic: Avro Metadata Provider
9.2.35.3.2.7 Limitations
Avro bytes data type cannot be used as primary key.
The source-to-target mapping that is defined in the Replicat configuration file is static. Oracle GoldenGate 12.2 and later support DDL propagation and source schema evolution for Oracle Databases as replication source. If you use DDL propagation and source schema evolution, you lose the ability to seamlessly handle changes to the source metadata.
Parent topic: Avro Metadata Provider
9.2.35.3.2.8 Troubleshooting
This topic contains the information about how to troubleshoot the following issues:
- Invalid Schema Files Location
- Invalid Schema File Name
- Invalid Namespace in Schema File
- Invalid Table Name in Schema File
Parent topic: Avro Metadata Provider
9.2.35.3.2.8.1 Invalid Schema Files Location
The Avro schema files directory specified in the gg.mdp.schemaFilesPath
configuration property must be a valid directory.If the path is not valid, you encounter following exception:
oracle.goldengate.util.ConfigException: Error initializing Avro metadata provider Specified schema location does not exist. {/path/to/schema/files/dir}
Parent topic: Troubleshooting
9.2.35.3.2.8.2 Invalid Schema File Name
For every table that is mapped in the process_name.prm
file, you must create a corresponding Avro schema file in the directory that is specified in gg.mdp.schemaFilesPath
.
For example, consider the following scenario:
Mapping:
MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2=cust_code, CITY2 = CITY);
Property:
gg.mdp.schemaFilesPath=/home/usr/avro/
In this scenario, you must create a file called GG_AVRO.TCUSTMER_AVRO.mdp.avsc
in the /home/usr/avro/
directory.
If you do not create the /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc
file, you encounter the following exception:
java.io.FileNotFoundException: /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc
Parent topic: Troubleshooting
9.2.35.3.2.8.3 Invalid Namespace in Schema File
The target schema name specified in Replicat mapping must be same as the namespace in the Avro schema definition file.
For example, consider the following scenario:
Mapping:
MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2 = cust_code, CITY2 = CITY); Avro Schema Definition: { "namespace": "GG_AVRO", .. }
In this scenario, Replicat abends with following exception:
Unable to retrieve table matadata. Table : GG_AVRO.TCUSTMER_AVRO Mapped [catalogname.]schemaname (GG_AVRO) does not match with the schema namespace {schema namespace}
Parent topic: Troubleshooting
9.2.35.3.2.8.4 Invalid Table Name in Schema File
The target table name that is specified in Replicat mapping must be same as the name in the Avro schema definition file.
For example, consider the following scenario:
Mapping:
MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2 = cust_code, CITY2 = CITY);
Avro Schema Definition:
{ "namespace": "GG_AVRO", "name": "TCUSTMER_AVRO", .. }
In this scenario, if the target table name specified in Replicat mapping does not match with the Avro schema name, then REPLICAT abends with following exception:
Unable to retrieve table matadata. Table : GG_AVRO.TCUSTMER_AVRO Mapped table name (TCUSTMER_AVRO) does not match with the schema table name {table name}
Parent topic: Troubleshooting
9.2.35.3.3 Cassandra Metadata Provider
The Cassandra metadata provider is used to retrieve the table metadata
from the Cassandra instance. The metadata is retrieved from Cassandra for every target table
that is mapped in the replicat properties file using the COLMAP
parameter.
The keyspace and tables should already be created on the target for Cassandra MDP to fetch
the metadata.
The metadata retrieved from the Cassandra target includes primary and partition key definitions as well. All columns retrieved from Cassandra target are by default marked as nullable by the metadata provider.
- Supported Cassandra Data Types
- Unsupported Cassandra Data Types
- Configuration
- Sample Configuration
- Limitations
Parent topic: Metadata Providers
9.2.35.3.3.1 Supported Cassandra Data Types
- BLOB
- BOOLEAN
- DECIMAL
- DOUBLE
- FLOAT
- INET
- ASCII
- TEXT
- VARCHAR
- TINYINT
- SMALLINT
- INT
- BIGINT
- VARINT
- DURATION
- TIME
- TIMESTAMP
- DATE
- TIMEUUID
Parent topic: Cassandra Metadata Provider
9.2.35.3.3.2 Unsupported Cassandra Data Types
- COUNTER
- MAP
- SET
- LIST
- UDT (user defined type)
- TUPLE
- CUSTOM_TYPE
- TIMESTAMP
- DATE
- TIMEUUID
Parent topic: Cassandra Metadata Provider
9.2.35.3.3.3 Configuration
The Cassandra Driver location must be included in the class path of the handler using the
gg.classpath
property.
Table 9-43 Cassandra Metadata Provider Configuration
Properties | Required/Optional | Legal values | Default | Explanation |
---|---|---|---|---|
gg.mdp.type |
Required | cassandra | None | Entering cassandra at a command prompt activates the use of the Cassandra Metadata Provider. |
gg.mdp.contactPoints |
Required | A comma separated list of host names that the Cassandra Handler will connect to. | None | A comma-separated list of the Cassandra host machines for the driver to establish an initial connection to the Cassandra cluster. This configuration property does not need to include all the machines enlisted in the Cassandra cluster. By connecting to a single machine, the driver can learn about other machines in the Cassandra cluster and establish connections to those machines as required. |
gg.mdp.port |
Optional | Integer | 9042 | Set to configure the port number that the Cassandra MDP attempts to connect to Cassandra server instances. You can override the default in the Cassandra YAML files. |
gg.mdp.datacenter |
Optional | The datacenter name | None | Set the datacenter name. If the datacenter name does not match the configured name on the server, then it will not connect to the database. |
gg.mdp.username |
Optional | A legal username string. | None | A username for the connection to name. Required if Cassandra is configured to require credentials. |
gg.mdp.password |
Optional | A legal password string. | None | A password for the connection to name. Required if Cassandra is configured to require credentials. |
gg.handler.name.DatastaxJSSEConfigPath |
Optional | String | None | Set the path and file name of a properties file containing the Cassandra driver configuration. Use when the Cassandra driver configuration needs to be configured for non-default values and potentially SSL connectivity. |
Parent topic: Cassandra Metadata Provider
9.2.35.3.3.4 Sample Configuration
Sample Properties File Content
gg.mdp.type=cassandra gg.mdp.datacenter=datacenter1 gg.mdp.contactPoints=localhost
REPLICAT cassandra MAP schema.tableName, TARGET keyspace.tableName;
Parent topic: Cassandra Metadata Provider
9.2.35.3.3.5 Limitations
The Cassandra handler table auto-creation module does not work with the
Cassandra Metadata provider as it expects the schema to be already created in the
Cassandra instance. Complex Cassandra data types, such as LIST
,
MAP
, and STRUCT
are not yet supported.
Parent topic: Cassandra Metadata Provider
9.2.35.3.4 Java Database Connectivity Metadata Provider
The Java Database Connectivity (JDBC) Metadata Provider is used to retrieve the table metadata from any target database that supports a JDBC connection and has a database schema. It is the preferred metadata provider for any target RDBMS database, although various other non-RDBMS targets also provide a JDBC driver.
- JDBC Detailed Functionality
- Java Classpath
- JDBC Metadata Provider Configuration
- Review a Sample Configuration
Parent topic: Metadata Providers
9.2.35.3.4.1 JDBC Detailed Functionality
The JDBC Metadata Provider uses the JDBC driver that is provided with your target database. The DBC driver retrieves the metadata for every target table that is mapped in the Replicat properties file. Replicat processes use the retrieved target metadata to map columns.
You can enable this feature for JDBC Handler by configuring the REPERROR
property in your Replicat parameter file. In addition, you need to define the error codes specific to your RDBMS JDBC target in the JDBC Handler properties file as follows:
Table 9-44 JDBC REPERROR
Codes
Property | Value | Required |
---|---|---|
gg.error.duplicateErrorCodes |
Comma-separated integer values of error codes that indicate duplicate errors |
No |
gg.error.notFoundErrorCodes |
Comma-separated integer values of error codes that indicate Not Found errors |
No |
gg.error.deadlockErrorCodes |
Comma-separated integer values of error codes that indicate deadlock errors |
No |
For example:
#ErrorCode
gg.error.duplicateErrorCodes=1062,1088,1092,1291,1330,1331,1332,1333
gg.error.notFoundErrorCodes=0
gg.error.deadlockErrorCodes=1213
To understand how the various JDBC types are mapped to database-specific SQL types, see https://docs.oracle.com/javase/6/docs/technotes/guides/jdbc/getstart/mapping.html#table1.
Parent topic: Java Database Connectivity Metadata Provider
9.2.35.3.4.2 Java Classpath
The JDBC Java Driver location must be included in the class path of the handler using the gg.classpath
property.
For example, the configuration for a MySQL database might be:
gg.classpath= /path/to/jdbc/driver/jar/mysql-connector-java-5.1.39-bin.jar
Parent topic: Java Database Connectivity Metadata Provider
9.2.35.3.4.3 JDBC Metadata Provider Configuration
The following are the configurable values for the JDBC Metadata Provider. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
Table 9-45 JDBC Metadata Provider Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
|
None |
Entering |
|
Required |
|
None |
The target database JDBC URL. |
|
Required |
Java class name of the JDBC driver |
None |
The fully qualified Java class name of the JDBC driver. |
|
Optional |
A legal username string. |
None |
The user name for the JDBC connection. Alternatively, you can provide the user name using the |
|
Optional |
A legal password string |
None |
The password for the JDBC connection. Alternatively, you can provide the password using the |
Parent topic: Java Database Connectivity Metadata Provider
9.2.35.3.4.4 Review a Sample Configuration
MySQL Driver Configuration
gg.mdp.type=jdbc gg.mdp.ConnectionUrl=jdbc:oracle:thin:@myhost:1521:orcl gg.mdp.DriverClassName=oracle.jdbc.driver.OracleDriver gg.mdp.UserName=username gg.mdp.Password=password
Netezza Driver Configuration
gg.mdp.type=jdbc gg.mdp.ConnectionUrl=jdbc:netezza://hostname:port/databaseName gg.mdp.DriverClassName=org.netezza.Driver gg.mdp.UserName=username gg.mdp.Password=password
Oracle OCI Driver configuration
ggg.mdp.type=jdbc gg.mdp.ConnectionUrl=jdbc:oracle:oci:@myhost:1521:orcl gg.mdp.DriverClassName=oracle.jdbc.driver.OracleDriver gg.mdp.UserName=username gg.mdp.Password=password
Oracle Teradata Driver configuration
gg.mdp.type=jdbc gg.mdp.ConnectionUrl=jdbc:teradata://10.111.11.111/USER=username,PASSWORD=password gg.mdp.DriverClassName=com.teradata.jdbc.TeraDriver gg.mdp.UserName=username gg.mdp.Password=password
Oracle Thin Driver Configuration
gg.mdp.type=jdbc gg.mdp.ConnectionUrl=jdbc:mysql://localhost/databaseName?user=username&password=password gg.mdp.DriverClassName=com.mysql.jdbc.Driver gg.mdp.UserName=username gg.mdp.Password=password
Redshift Driver Configuration
gg.mdp.type=jdbc gg.mdp.ConnectionUrl=jdbc:redshift://hostname:port/databaseName gg.mdp.DriverClassName=com.amazon.redshift.jdbc42.Driver gg.mdp.UserName=username gg.mdp.Password=password
Parent topic: Java Database Connectivity Metadata Provider
9.2.35.3.5 Hive Metadata Provider
The Hive Metadata Provider is used to retrieve the table metadata from a Hive
metastore. The metadata is retrieved from Hive for every target table that is mapped in the
Replicat properties file using the COLMAP
parameter. The retrieved target
metadata is used by Replicat for the column mapping functionality.
- Detailed Functionality
- Configuring Hive with a Remote Metastore Database
- Classpath Configuration
- Hive Metadata Provider Configuration Properties
- Review a Sample Configuration
- Security
- Metadata Change Event
- Limitations
- Additional Considerations
- Troubleshooting
Parent topic: Metadata Providers
9.2.35.3.5.1 Detailed Functionality
The Hive Metadata Provider uses both Hive JDBC and HCatalog interfaces to retrieve metadata from the Hive metastore. For each table mapped in the process_name.prm
file, a corresponding table is created in Hive.
The default Hive configuration starts an embedded, local metastore Derby database. Because, Apache Derby is designed to be an embedded database, it allows only a single connection. The limitation of the Derby Database means that it cannot function when working with the Hive Metadata Provider. To workaround this limitation this, you must configure Hive with a remote metastore database. For more information about how to configure Hive with a remote metastore database, see https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration.
Hive does not support Primary Key semantics, so the metadata retrieved from Hive metastore does not include a primary key definition. When you use the Hive Metadata Provider, use the Replicat KEYCOLS
parameter to define primary keys.
KEYCOLS
Use the KEYCOLS
parameter must be used to define primary keys in the target schema. The Oracle GoldenGate HBase Handler requires primary keys. Therefore, you must set primary keys in the target schema when you use Replicat mapping with HBase as the target.
The output of the Avro formatters includes an Array field to hold the primary column names. If you use Replicat mapping with the Avro formatters, consider using KEYCOLS
to identify the primary key columns.
For example configurations of KEYCOLS
, see Review a Sample Configuration.
Supported Hive Data types
-
BIGINT
-
BINARY
-
BOOLEAN
-
CHAR
-
DATE
-
DECIMAL
-
DOUBLE
-
FLOAT
-
INT
-
SMALLINT
-
STRING
-
TIMESTAMP
-
TINYINT
-
VARCHAR
See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
.
Parent topic: Hive Metadata Provider
9.2.35.3.5.2 Configuring Hive with a Remote Metastore Database
You can find a list of supported databases that you can use to configure remote Hive metastore can be found at https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-SupportedBackendDatabasesforMetastore.
The following example shows a MySQL database is configured as the Hive metastore using properties in the ${HIVE_HOME}/conf/hive-site.xml
Hive configuration file.
Note:
The ConnectionURL
and driver class used in this example are specific to MySQL database. If you use a database other than MySQL, then change the values to fit your configuration.
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://MYSQL_DB_IP:MYSQL_DB_PORT/DB_NAME?createDatabaseIfNotExist=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>MYSQL_CONNECTION_USERNAME</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>MYSQL_CONNECTION_PASSWORD</value> </property>
To see a list of parameters to configure in the hive-site.xml
file for a remote metastore, see https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastoreDatabase.
Note:
Follow these steps to add the MySQL JDBC connector JAR in the Hive classpath:
-
In
HIVE_HOME
/lib/
directory.DB_NAME
should be replaced by a valid database name created in MySQL. -
Start the Hive Server:
HIVE_HOME
/bin/hiveserver2/bin/hiveserver2
-
Start the Hive Remote Metastore Server:
HIVE_HOME
/bin/hive --service metastore
Parent topic: Hive Metadata Provider
9.2.35.3.5.3 Classpath Configuration
For the Hive Metadata Provider to connect to Hive, you must configure thehive-site.xml
file and the Hive and HDFS client jars in the gg.classpath
variable. The client JARs must match the version of Hive to which the Hive Metadata Provider is connecting.
For example, if the hive-site.xml
file is created in the /home/user/oggadp/dirprm
directory, then gg.classpath
entry is gg.classpath=/home/user/oggadp/dirprm/
-
Create a
hive-site.xml
file that has the following properties:<configuration> <!-- Mandatory Property --> <property> <name>hive.metastore.uris</name> <value>thrift://HIVE_SERVER_HOST_IP:9083</value> <property> <!-- Optional Property. Default value is 5 --> <property> <name>hive.metastore.connect.retries</name> <value>3</value> </property> <!-- Optional Property. Default value is 1 --> <property> <name>hive.metastore.client.connect.retry.delay</name> <value>10</value> </property> <!-- Optional Property. Default value is 600 seconds --> <property> <name>hive.metastore.client.socket.timeout</name> <value>50</value> </property> </configuration>
-
By default, the following directories contain the Hive and HDFS client jars:
HIVE_HOME
/hcatalog/share/hcatalog/*HIVE_HOME
/lib/*HIVE_HOME
/hcatalog/share/webhcat/java-client/*HADOOP_HOME
/share/hadoop/common/*HADOOP_HOME
/share/hadoop/common/lib/*HADOOP_HOME
/share/hadoop/mapreduce/*Configure the
gg.classpath
exactly as shown in the step 1. The path to thehive-site.xml
file must be the path with no wildcard appended. If you include the*
wildcard in the path to thehive-site.xml
file, it will not be located. The path to the dependency JARs must include the*
wildcard character to include all of the JAR files in that directory in the associated classpath. Do not use*.jar.
Parent topic: Hive Metadata Provider
9.2.35.3.5.4 Hive Metadata Provider Configuration Properties
Property | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
|
|
Selects the Hive Metadata Provider |
|
Required |
Format without Kerberos Authentication: j Format with Kerberos Authentication:
|
|
The JDBC connection URL of the Hive server |
|
Required |
|
|
The fully qualified Hive JDBC driver class name |
|
Optional |
Valid username |
|
The user name for connecting to the Hive database. The |
|
Optional |
Valid Password |
|
The password for connecting to the Hive database |
|
Optional |
Valid character set |
|
The character set of the column with the character data type. Used to convert the source data from the trail file to the correct target character set. |
|
Optional |
Valid character set |
|
The character set of the column with the national character data type. Used to convert the source data from the trail file to the correct target character set. For example, this property may indicate the character set of columns, such as |
|
Optional |
Kerberos |
none |
Allows you to designate Kerberos authentication to Hive. |
|
Optional (Required if |
Relative or absolute path to a Kerberos keytab file. |
|
The |
|
Optional (Required if |
A legal Kerberos principal name( |
|
The Kerberos principal name for Kerberos authentication. |
Parent topic: Hive Metadata Provider
9.2.35.3.5.5 Review a Sample Configuration
This is an example for configuring the Hive Metadata Provider. Consider a source with following table:
TABLE GG.TCUSTMER { CUST_CODE VARCHAR(4) PRIMARY KEY, NAME VARCHAR(100), CITY VARCHAR(200), STATE VARCHAR(200)}
The example maps the column CUST_CODE
(GG.TCUSTMER)
in the source to CUST_CODE2
(GG_HIVE.TCUSTMER_HIVE)
on the target and column CITY
(GG.TCUSTMER
) in the source to CITY2
(GG_HIVE.TCUSTMER_HIVE)
on the target.
Mapping configuration in the process_name. prm
file includes the following configuration:
MAP GG.TCUSTMER, TARGET GG_HIVE.TCUSTMER_HIVE, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY) KEYCOLS(CUST_CODE2);
In this example:
-
The source schema
GG
is mapped to the target schemaGG_HIVE
. -
The source column
CUST_CODE
is mapped to the target columnCUST_CODE2
. -
The source column
CITY
is mapped to the target columnCITY2
. -
USEDEFAULTS
specifies that rest of the column names are same on both source and target (NAME
andSTATE
columns). -
KEYCOLS
is used to specify thatCUST_CODE2
should be treated as primary key.
Because primary keys cannot be specified in the Hive DDL, the KEYCOLS
parameter is used to specify the primary keys.
Note:
You can choose any schema name and are not restricted to the gg_hive schema
name. The Hive schema can be pre-existing or newly created. You do this by modifying the connection URL (gg.mdp.connectionUrl
) in the Java Adapter properties file and the mapping configuration in the Replicat.prm
file. Once the schema name is changed, update the connection URL (gg.mdp.connectionUrl
) and mapping in the Replicat.prm
file.
You can create the schema and tables for this example in Hive by using the following commands. You can create the schema and tables for this example in Hive by using the following commands. To start the Hive CLI use the following command:
HIVE_HOME
/bin/hive
To create the GG_HIVE
schema, in Hive, use the following command:
hive> create schema gg_hive; OK Time taken: 0.02 seconds
To create the TCUSTMER_HIVE
table in the GG_HIVE
database, use the following command:
hive> CREATE EXTERNAL TABLE `TCUSTMER_HIVE`( > "CUST_CODE2" VARCHAR(4), > "NAME" VARCHAR(30), > "CITY2" VARCHAR(20), > "STATE" STRING); OK Time taken: 0.056 seconds
Configure the .properties
file in a way that resembles the following:
gg.mdp.type=hive
gg.mdp.connectionUrl=jdbc:hive2://HIVE_SERVER_IP:10000/gg_hive
gg.mdp.driverClassName=org.apache.hive.jdbc.HiveDriver
The following sample output uses the delimited text formatter, with a comma as the delimiter:
I;GG_HIVE.TCUSTMER_HIVE;2015-10-07T04:50:47.519000;cust_code2;WILL;name;BG SOFTWARE CO;city2;SEATTLE;state;WA
A sample Replicat configuration file, Java Adapter properties file, and Hive create table SQL script are included with the installation at the following location:
GoldenGate_install_directory
/AdapterExamples/big-data/metadata_provider/hive
Parent topic: Hive Metadata Provider
9.2.35.3.5.6 Security
You can secure the Hive server using Kerberos authentication. For information about how to secure the Hive server, see the Hive documentation for the specific Hive release. The Hive Metadata Provider can connect to a Kerberos secured Hive server.
Make sure that the paths to the HDFS core-site.xml
file and the hive-site.xml
file are in the handler's classpath.
Enable the following properties in the core-site.xml
file:
<property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property>
Enable the following properties in the hive-site.xml
file:
<property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/path/to/keytab</value> <!-- Change this value --> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>Kerberos Principal</value> <!-- Change this value --> </property> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>Kerberos Principal</value> <!-- Change this value --> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/path/to/keytab</value> <!-- Change this value --> </property>
Parent topic: Hive Metadata Provider
9.2.35.3.5.7 Metadata Change Event
Tables in Hive metastore should be updated, altered, or created manually if the source database tables change. In the case of a metadata change event, you may wish to terminate or suspend the Replicat process. You can terminate the Replicat process by adding the following to the Replicat configuration file (process_name. prm
):
DDL INCLUDE ALL, EVENTACTIONS (ABORT)
You can suspend, the Replicat process by adding the following to the Replication configuration file:
DDL INCLUDE ALL, EVENTACTIONS (SUSPEND)
Parent topic: Hive Metadata Provider
9.2.35.3.5.8 Limitations
Columns with binary data type cannot be used as primary keys.
The source-to-target mapping that is defined in the Replicat configuration file is static. Oracle GoldenGate 12.2 and later versions supports DDL propagation and source schema evolution for Oracle databases as replication sources. If you use DDL propagation and source schema evolution, you lose the ability to seamlessly handle changes to the source metadata.
Parent topic: Hive Metadata Provider
9.2.35.3.5.9 Additional Considerations
The most common problems encountered are the Java classpath issues. The Hive Metadata Provider requires certain Hive and HDFS client libraries to be resolved in its classpath.
The required client JAR directories are listed in Classpath Configuration. Hive and HDFS client JARs do not ship with Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA). The client JARs should be of the same version as the Hive version to which the Hive Metadata Provider is connecting.
To establish a connection to the Hive server, the hive-site.xml
file must be in the classpath.
Parent topic: Hive Metadata Provider
9.2.35.3.5.10 Troubleshooting
If the mapped target table is not present in Hive, the Replicat process will terminate with a "Table metadata resolution exception".
For example, consider the following mapping:
MAP GG.TCUSTMER, TARGET GG_HIVE.TCUSTMER_HIVE, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY) KEYCOLS(CUST_CODE2);
This mapping requires a table called TCUSTMER_HIVE
to be created in the schema GG_HIVE
in the Hive metastore. If this table is not present in Hive, then the following exception occurs:
ERROR [main) - Table Metadata Resolution Exception Unable to retrieve table matadata. Table : GG_HIVE.TCUSTMER_HIVE NoSuchObjectException(message:GG_HIVE.TCUSTMER_HIVE table not found)
Parent topic: Hive Metadata Provider
9.2.35.3.6 Google BigQuery Metadata Provider
Google metadata provider uses the Google Query Job to retrieve the metadata schema information from the Google BigQuery Table. The Table should already be created on the target for BigQuery to fetch the metadata.
Google BigQuery does not support primary key semantics, so the metadata
retrieved from BigQuery Table does not include any primary key definition. You can
identify the primary keys using the KEYCOLS
syntax in the replicat
mapping statement. If KEYCOLS
is not present, then the key information
from the source table is used.
- Authentication
- Supported BigQuery Datatypes
- Parameterized BigQuery Datatypes
The BigQuery datatypes that can be parameterized to add constraints are STRING, BYTES, NUMERIC, and BIGNUMERIC. The STRING and BYTES datatypes can have length constraints.NUMERIC and BIGNUMERIC can have scale and precision constraints. - Unsupported BigQuery Datatypes
- Configuring BigQuery Metadata Provider
- Sample Configuration
- Proxy Settings
- Classpath Settings
- Limitations
Parent topic: Metadata Providers
9.2.35.3.6.1 Authentication
Google BigQuery cloud service account can be connected either using the credentials JSON file by setting the path to the file in MDP property or setting the individual keys of credentials JSON into BigQuery MDP properties. The individual properties of BigQuery metadata provider for configuring the service account credential keys can be encrypted using Oracle wallet.
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.2 Supported BigQuery Datatypes
The following table lists the Google BigQuery datatypes that are supported and their default scale and precision values:
Data Type | Range | Max Scale | Max Precision | Max Bytes |
---|---|---|---|---|
BOOL |
|
NA |
NA |
1 |
INT64 |
|
NA |
NA |
8 |
FLOAT64 |
NA |
NA |
None |
8 |
NUMERIC |
Min: Max: |
9 | 38 | 64 |
BIG NUMERIC |
Min:
Max:
|
38 | 77 | 255 |
STRING | Unlimited | NA | NA | 2147483647L |
BYTES | Unlimited | NA | NA | 2147483647L |
DATE | 0001-01-01 to 9999-12-31 |
NA | NA | NA |
TIME | 00:00:00 to 23:59:59.999999 |
NA | NA | NA |
TIMESTAMP | 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999
UTC |
NA | NA | NA |
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.3 Parameterized BigQuery Datatypes
The BigQuery datatypes that can be parameterized to add constraints are STRING, BYTES, NUMERIC, and BIGNUMERIC. The STRING and BYTES datatypes can have length constraints.NUMERIC and BIGNUMERIC can have scale and precision constraints.
- STRING(L): L is the maximum number of Unicode characters allowed.
- BYTES(L): L is the maximum number of bytes allowed.
- NUMERIC(P[, S]) or BIGNUMERIC(P[, S]): P is maximum precision (total number of digits) and S is maximum scale (number of digits after decimal) that is allowed.
The parameterized datatypes are supported in BigQuery Metadata Provider. If there is a datatype with user-defined precision, scale or max-length, then metadata provider calculates the data based on those values.
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.4 Unsupported BigQuery Datatypes
The following table lists the Google BigQuery datatypes that are supported and their default scale and precision values:
The BigQuery datatypes that are not supported by metadata provider are complex datatypes, such as GEOGRAPHY, JSON, ARRAY, INTERVAL, and STRUCT. The metadata provider is going to abend with invalid datatype exception if it encounters them.
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.5 Configuring BigQuery Metadata Provider
The following table lists the configuration properties for BigQuery metadata provider:
Property | Required/Optional | Legal Values | Default | Explanationtes |
---|---|---|---|---|
|
Required | bq | NA | Select BigQuery Metadata Provider |
gg.mdp.credentialsFile |
Optional | File path to credentials JSON file. | NA | Provides path to the credentials JSON file for connecting to Google BigQuery Service account. |
gg.mdp.clientId |
Optional | Valid BigQuery Credentials Client Id | NA | Provides the client Id key from the credentials file for connecting to Google BigQuery service account. |
gg.mdp.clientEmail |
Optional | Valid BigQuery Credentials Client Email | NA | Provides the client Email key from the credentials file for connecting to Google BigQuery service account. |
gg.mdp.privateKeyId |
Optional | Valid BigQuery Credentials Private Key ID | NA | Provides the Private Key ID from the credentials file for connecting to Google BigQuery service account. |
gg.mdp.privateKey |
Optional | Valid BigQuery Credentials Private Key | NA | Provides the Private Key from the credentials file for connecting to Google BigQuery service account. |
gg.mdp.projectId |
Optional | Unique BigQuery project Id | NA | Unique project Id of BigQuery. |
gg.mdp.connectionTimeout |
Optional | Time in sec | 5 | Connect Timeout for BigQuery connection. |
gg.mdp.readTimeout |
Optional | Time in sec | 6 | Timeout to read from BigQuery connection. |
gg.mdp.totalTimeout |
Optional | Time in sec | 9 | Total timeout for BigQuery connection. |
gg.mdp.retryCount |
Optional | Maximum number of retries. | 3 | Maximum number of reties for connecting to BigQuery. |
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.6 Sample Configuration
Sample properties file content:
gg.mdp.type=bq
gg.mdp.credentialsFile=/path/to/credFile.json
Sample parameter file:
REPLICAT bqeh
TARGETDB LIBFILE libggjava.so SET property=dirprm/bqeh.props
MAP schema.tableName, TARGET schema.tableName;
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.7 Proxy Settings
jvm.bootoptions= -Dhttps.proxyHost=www-proxy.us.oracle.com -Dhttps.proxyPort=80
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.8 Classpath Settings
The dependency of BigQuery metadata provider is same as the Google BigQuery stage-and-merge Event Handler dependency. The dependencies added to the Oracle GoldenGate class-path for BigQuery event Handler is sufficient for running the BigQuery metadata provider, and no extra dependency need to be configured.
Parent topic: Google BigQuery Metadata Provider
9.2.35.3.6.9 Limitations
The complex BigQuery datatypes are not yet supported by the metadata provider. It will abend in case any of unsupported datatypes are encountered.
If the BigQuery handler or event-handler is configured to auto create table and dataspace, then the metadata provider expects table to exist in order to fetch the metadata. The feature to auto-create table and dataspace of BigQuery handler and event handler does not work with BigQuery metadata provider. Metadata change event is not supported by Big Query metadata provider. It can be configured to abend or suspend in case there is a metadata change.
Parent topic: Google BigQuery Metadata Provider
9.2.35.4 Pluggable Formatters
The pluggable formatters are used to convert operations from the Oracle GoldenGate trail file into formatted messages that you can send to Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) targets using one of the GG for DAA handlers.
This chapter describes how to use the pluggable formatters.
- Using Operation-Based versus Row-Based Formatting
The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) formatters include operation-based and row-based formatters. - Using the Avro Formatter
Apache Avro is an open source data serialization and deserialization framework known for its flexibility, compactness of serialized data, and good serialization and deserialization performance. Apache Avro is commonly used in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications. - Cloud Event Formatter
- Existing Avro Formatter
- Using the Delimited Text Formatter
- Using the JSON Formatter
- Using the Length Delimited Value Formatter
The Length Delimited Value (LDV) Formatter is a row-based formatter. It formats database operations from the source trail file into a length delimited value output. Each insert, update, delete, or truncate operation from the source trail is formatted into an individual length delimited message. - Using the XML Formatter
The XML Formatter formats before-image and after-image data from the source trail file into an XML document representation of the operation data. The format of the XML document is effectively the same as the XML format in the previous releases of the Oracle GoldenGate Java Adapter.
Parent topic: Additional Details
9.2.35.4.1 Using Operation-Based versus Row-Based Formatting
The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) formatters include operation-based and row-based formatters.
The operation-based formatters represent the individual insert, update, and delete events that occur on table data in the source database. Insert operations only provide after-change data (or images), because a new row is being added to the source database. Update operations provide both before-change and after-change data that shows how existing row data is modified. Delete operations only provide before-change data to identify the row being deleted. The operation-based formatters model the operation as it is exists in the source trail file. Operation-based formats include fields for the before-change and after-change images.
The row-based formatters model the row data as it exists after the operation data is applied. Row-based formatters contain only a single image of the data. The following sections describe what data is displayed for both the operation-based and the row-based formatters.
Parent topic: Pluggable Formatters
9.2.35.4.1.1 Operation Formatters
The formatters that support operation-based formatting are JSON, Avro Operation, and XML. The output of operation-based formatters are as follows:
-
Insert operation: Before-image data is null. After image data is output.
-
Update operation: Both before-image and after-image data is output.
-
Delete operation: Before-image data is output. After-image data is null.
-
Truncate operation: Both before-image and after-image data is null.
Parent topic: Using Operation-Based versus Row-Based Formatting
9.2.35.4.1.2 Row Formatters
The formatters that support row-based formatting are Delimited Text and Avro Row. Row-based formatters output the following information for the following operations:
-
Insert operation: After-image data only.
-
Update operation: After-image data only. Primary key updates are a special case which will be discussed in individual sections for the specific formatters.
-
Delete operation: Before-image data only.
-
Truncate operation: The table name is provided, but both before-image and after-image data are null. Truncate table is a DDL operation, and it may not support different database implementations. Refer to the Oracle GoldenGate documentation for your database implementation.
Parent topic: Using Operation-Based versus Row-Based Formatting
9.2.35.4.1.3 Table Row or Column Value States
In an RDBMS, table data for a specific row and column can only have one of two states: either the data has a value, or it is null. However; when data is transferred to the Oracle GoldenGate trail file by the Oracle GoldenGate capture process, the data can have three possible states: it can have a value, it can be null, or it can be missing.
For an insert operation, the after-image contains data for all column values regardless of whether the data is null.. However, the data included for update and delete operations may not always contain complete data for all columns. When replicating data to an RDBMS for an update operation only the primary key values and the values of the columns that changed are required to modify the data in the target database. In addition, only the primary key values are required to delete the row from the target database. Therefore, even though values are present in the source database, the values may be missing in the source trail file. Because data in the source trail file may have three states, the Plugable Formatters must also be able to represent data in all three states.
Because the row and column data in the Oracle GoldenGate trail file has an important effect on Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) integration, it is important to understand the data that is required. Typically, you can control the data that is included for operations in the Oracle GoldenGate trail file. In an Oracle database, this data is controlled by the supplemental logging level. To understand how to control the row and column values that are included in the Oracle GoldenGate trail file, see the Oracle GoldenGate documentation for your source database implementation..
Parent topic: Using Operation-Based versus Row-Based Formatting
9.2.35.4.2 Using the Avro Formatter
Apache Avro is an open source data serialization and deserialization framework known for its flexibility, compactness of serialized data, and good serialization and deserialization performance. Apache Avro is commonly used in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications.
Parent topic: Pluggable Formatters
9.2.35.4.2.1 Avro Row Formatter
The Avro Row Formatter formats operation data from the source trail file into messages in an Avro binary array format. Each individual insert, update, delete, and truncate operation is formatted into an individual Avro message. The source trail file contains the before and after images of the operation data. The Avro Row Formatter takes the before-image and after-image data and formats it into an Avro binary representation of the operation data.
The Avro Row Formatter formats operations from the source trail file into a format that represents the row data. This format is more compact than the output from the Avro Operation Formatter for the Avro messages model the change data operation.
The Avro Row Formatter may be a good choice when streaming Avro data to HDFS. Hive supports data files in HDFS in an Avro format.
This section contains the following topics:
- Operation Metadata Formatting Details
The automated output of meta-column fields in generated Avro messages has been removed as of Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release 21.1. Meta-column fields can still be output; however, they need to explicitly configured as the following property:gg.handler.name.format.metaColumnsTemplate
. - Operation Data Formatting Details
- Sample Avro Row Messages
- Avro Schemas
Avro uses JSONs to represent schemas. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages. - Avro Row Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Special Considerations
Parent topic: Using the Avro Formatter
9.2.35.4.2.1.1 Operation Metadata Formatting Details
The automated output of meta-column fields in generated Avro messages
has been removed as of Oracle GoldenGate for Distributed Applications and Analytics
(GG for DAA) release 21.1. Meta-column fields can still be output; however, they
need to explicitly configured as the following property:
gg.handler.name.format.metaColumnsTemplate
.
To output the metacolumns configure the following:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
For more information see the configuration property:
gg.handler.name.format.metaColumnsTemplate
.
Table 9-46 Avro Formatter Metadata
Value | Description |
---|---|
|
The fully qualified table in the format is: |
|
The type of database operation from the source trail file. Default values are |
|
The timestamp of the operation from the source trail file. Since this timestamp is from the source trail, it is fixed. Replaying the trail file results in the same timestamp for the same operation. |
|
The time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will not result in the same timestamp for the same operation. |
|
The concatenated sequence number and the RBA number from the source trail file. This trail position lets you trace the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. |
|
An array variable that holds the column names of the primary keys of the source table. |
|
A map variable that holds the token key value pairs from the source trail file. |
Parent topic: Avro Row Formatter
9.2.35.4.2.1.2 Operation Data Formatting Details
The operation data follows the operation metadata. This data is represented as individual fields identified by the column names.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. Avro attributes only support two states, the column has a value or the column value is null. Missing column values are handled the same as null values. Oracle recommends that when you use the Avro Row Formatter, you configure the Oracle GoldenGate capture process to provide full image data for all columns in the source trail file.
By default, the setting of the Avro Row Formatter maps the data types from the source trail file to the associated Avro data type. Because Avro provides limited support for data types, source columns map into Avro long, double, float, binary, or string data types. You can also configure data type mapping to handle all data as strings.
Parent topic: Avro Row Formatter
9.2.35.4.2.1.3 Sample Avro Row Messages
Because Avro messages are binary, they are not human readable. The following sample messages show the JSON representation of the messages.
Parent topic: Avro Row Formatter
9.2.35.4.2.1.3.1 Sample Insert Message
{"table": "GG.TCUSTORD", "op_type": "I", "op_ts": "2013-06-02 22:14:36.000000", "current_ts": "2015-09-18T10:13:11.172000", "pos": "00000000000000001444", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAA"}, "CUST_CODE": "WILL", "ORDER_DATE": "1994-09-30:15:33:00", "PRODUCT_CODE": "CAR", "ORDER_ID": "144", "PRODUCT_PRICE": 17520.0, "PRODUCT_AMOUNT": 3.0, "TRANSACTION_ID": "100"}
Parent topic: Sample Avro Row Messages
9.2.35.4.2.1.3.2 Sample Update Message
{"table": "GG.TCUSTORD", "op_type": "U", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:13:11.492000", "pos": "00000000000000002891", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqLzAAA"}, "CUST_CODE": "BILL", "ORDER_DATE": "1995-12-31:15:00:00", "PRODUCT_CODE": "CAR", "ORDER_ID": "765", "PRODUCT_PRICE": 14000.0, "PRODUCT_AMOUNT": 3.0, "TRANSACTION_ID": "100"}
Parent topic: Sample Avro Row Messages
9.2.35.4.2.1.3.3 Sample Delete Message
{"table": "GG.TCUSTORD", "op_type": "D", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:13:11.512000", "pos": "00000000000000004338", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"L": "206080450", "6": "9.0.80330", "R": "AADPkvAAEAAEqLzAAC"}, "CUST_CODE": "DAVE", "ORDER_DATE": "1993-11-03:07:51:35", "PRODUCT_CODE": "PLANE", "ORDER_ID": "600", "PRODUCT_PRICE": null, "PRODUCT_AMOUNT": null, "TRANSACTION_ID": null}
Parent topic: Sample Avro Row Messages
9.2.35.4.2.1.3.4 Sample Truncate Message
{"table": "GG.TCUSTORD", "op_type": "T", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:13:11.514000", "pos": "00000000000000004515", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAB"}, "CUST_CODE": null, "ORDER_DATE": null, "PRODUCT_CODE": null, "ORDER_ID": null, "PRODUCT_PRICE": null, "PRODUCT_AMOUNT": null, "TRANSACTION_ID": null}
Parent topic: Sample Avro Row Messages
9.2.35.4.2.1.4 Avro Schemas
Avro uses JSONs to represent schemas. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages.
Schemas are generated on a just-in-time basis when the first operation for a table
is encountered. Newer schemas are generated when there is a change in the metadata. The
generated Avro schemas are specific to a table definition, and therefore, a separate Avro
schema is generated for every table encountered for processed operations. By default, Avro
schemas are written to the GoldenGate_Home
/dirdef
directory, although the write location is configurable. Avro schema file names adhere to the
following naming convention:
Fully_Qualified_Table_Name
.avsc
.
The following is a sample Avro schema for the Avro Row Format for the references examples in the previous section:
{ "type" : "record", "name" : "TCUSTORD", "namespace" : "GG", "fields" : [ { "name" : "table", "type" : "string" }, { "name" : "op_type", "type" : "string" }, { "name" : "op_ts", "type" : "string" }, { "name" : "current_ts", "type" : "string" }, { "name" : "pos", "type" : "string" }, { "name" : "primary_keys", "type" : { "type" : "array", "items" : "string" } }, { "name" : "tokens", "type" : { "type" : "map", "values" : "string" }, "default" : { } }, { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_DATE", "type" : [ "null", "string" ], "default" : null }, { "name" : "PRODUCT_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_ID", "type" : [ "null", "string" ], "default" : null }, { "name" : "PRODUCT_PRICE", "type" : [ "null", "double" ], "default" : null }, { "name" : "PRODUCT_AMOUNT", "type" : [ "null", "double" ], "default" : null }, { "name" : "TRANSACTION_ID", "type" : [ "null", "string" ], "default" : null } ] }
Parent topic: Avro Row Formatter
9.2.35.4.2.1.5 Avro Row Configuration Properties
Table 9-47 Avro Row Configuration Properties
Properties | Optional/ Required | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.insertOpKey |
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java. |
UTF-8 (the JSON default) |
Controls the output encoding of generated JSON Avro schema. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding. |
gg.handler.name.format.treatAllColumnsAsStrings |
Optional |
|
|
Controls the output typing of generated Avro messages. If set to false then the formatter will attempt to map Oracle GoldenGate types to the corresponding AVRO type. If set to true then all data will be treated as Strings in the generated Avro messages and schemas. |
gg.handler.name.format.pkUpdateHandling |
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations for the Avro Row formatter require special consideration.
|
|
Optional |
Any string |
no value |
Inserts a delimiter after each Avro message. This is not
a best practice, but in certain cases you may want to parse a stream
of data and extract individual Avro messages from the stream. Select
a unique delimiter that cannot occur in any Avro message. This
property supports |
|
Optional |
|
|
Avro schemas always follow
the |
|
Optional |
|
|
Wraps the Avro messages for operations from the source trail file in a generic Avro wrapper message. For more information, see Generic Wrapper Functionality. |
|
Optional |
Any legal, existing file system path. |
|
The output location of generated Avro schemas. |
|
Optional |
Any legal encoding name or alias supported by Java. |
|
The directory in the HDFS where schemas are output. A
metadata change overwrites the schema during the next operation for
the associated table. Schemas follow the same naming convention as
schemas written to the local file
system: |
|
Optional |
|
|
The format of the current timestamp. The default is the
ISO 8601 format. A setting of false removes the
T between the date and time in the current
timestamp, which outputs a space instead.
|
|
Optional |
|
|
Set to true to include a
{column_name}_isMissing boolean field
for each source field. This field allows downstream applications to
differentiate if a null value is null in the source trail file (value is
false ) or is missing in the source trail file
(value is true ).
|
|
Optional |
|
|
Enables the use of Avro |
|
Optional |
Any integer value from 0 to 38. |
None |
Allows you to set the scale on the Avro
|
gg.handler.name.format.mapOracleNumbersAsStrings |
Optional |
|
false |
This property is only applicable if decimal logical
types are enabled via the property
gg.handler.name.format.enableDecimalLogialType=true .
Oracle numbers are especially problematic because they have a large
precision (168) and floating scale of up to 38. Some analytical tools,
such as Spark cannot read numbers that large. This property allows you
to map those Oracle numbers as strings while still mapping the smaller
numbers as decimal logical types.
|
|
Optional |
|
|
Set to |
gg.handler.name.format.mapLargeNumbersAsStrings |
Optional | true | false |
false |
Oracle GoldenGate supports the floating point and
integer source datatypes. Some of these datatypes may not fit into the
Avro primitive double or long datatypes. Set this property to
true to map the fields that do not fit into the
Avro primitive double or long datatypes to Avro string.
|
gg.handler.name.format.metaColumnsTemplate |
Optional | See Metacolumn Keywords. | None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey |
truncateOpKey | includeTableName | includeOpTimestamp |
includeOpType | includePosition | includeCurrentTimestamp,
useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. For more information about the Metacolumn keywords, see Metacolumn Keywords. |
gg.handler.name.format.maxPrecision |
Optional | None | Positive Integer | Allows you to set the maximum precision for Avro decimal
logical types. Consuming applications may have limitations on Avro
precision (that is, Apache Spark supports a maximum precision of
38).
WARNING: Configuration of this property is not without risk. |
Parent topic: Avro Row Formatter
9.2.35.4.2.1.6 Review a Sample Configuration
The following is a sample configuration for the Avro Row Formatter in the Java Adapter properties file:
gg.handler.hdfs.format=avro_row gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.encoding=UTF-8 gg.handler.hdfs.format.pkUpdateHandling=abend gg.handler.hdfs.format.wrapMessageInGenericAvroMessage=false
Parent topic: Avro Row Formatter
9.2.35.4.2.1.7 Metadata Change Events
If the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events, the Avro Row Formatter can take action when metadata changes. Because Avro messages depend closely on their corresponding schema, metadata changes are important when you use Avro formatting.
An updated Avro schema is generated as soon as a table operation occurs after a metadata change event. You must understand the impact of a metadata change event and change downstream targets to the new Avro schema. The tight dependency of Avro messages to Avro schemas may result in compatibility issues. Avro messages generated before the schema change may not be able to be deserialized with the newly generated Avro schema.
Conversely, Avro messages generated after the schema change may not be able to be deserialized with the previous Avro schema. It is a best practice to use the same version of the Avro schema that was used to generate the message. For more information, consult the Apache Avro documentation.
Parent topic: Avro Row Formatter
9.2.35.4.2.1.8 Special Considerations
This sections describes these special considerations:
9.2.35.4.2.1.8.1 Troubleshooting
Because Avro is a binary format, it is not human readable. Since Avro messages are in binary format, it is difficult to debug any issue, the Avro Row Formatter provides a special feature to help debug issues. When the log4j
Java logging level is set to TRACE
, Avro messages are deserialized and displayed in the log file as a JSON object, letting you view the structure and contents of the created Avro messages. Do not enable TRACE
in a production environment as it has substantial negative impact on performance. To troubleshoot content, you may want to consider switching to use a formatter that produces human-readable content. The XML or JSON formatters both produce content in human-readable format.
Parent topic: Special Considerations
9.2.35.4.2.1.8.2 Primary Key Updates
In Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) integrations, primary key update operations require special consideration and planning. Primary key updates modify one or more of the primary keys of a given row in the source database. Because data is appended in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications, a primary key update operation looks more like a new insert than like an update without special handling. You can use the following properties to configure the Avro Row Formatter to handle primary keys:
Table 9-48 Configurable behavior
Value | Description |
---|---|
|
The formatter terminates. This behavior is the default behavior. |
|
With this configuration the primary key update is treated like any other update operation. Use this configuration only if you can guarantee that the primary key is not used as selection criteria row data from a GG for DAA system. |
|
The primary key update is treated as a special case of a delete, using the before image data and an insert using the after-image data. This configuration may more accurately model the effect of a primary key update in a GG for DAA application. However, if this configuration is selected, it is important to have full supplemental logging enabled on Replication at the source database. Without full supplemental logging the delete operation will be correct, but insert operation will not contain all of the data for all of the columns for a full representation of the row data in the GG for DAA application. |
Parent topic: Special Considerations
9.2.35.4.2.1.8.3 Generic Wrapper Functionality
Because Avro messages are not self describing, the receiver of the message must know the schema associated with the message before the message can be deserialized. Avro messages are binary and provide no consistent or reliable way to inspect the message contents in order to ascertain the message type. Therefore, Avro can be troublesome when messages are interlaced into a single stream of data such as Kafka.
The Avro formatter provides a special feature to wrap the Avro message in a generic Avro message. You can enable this functionality by setting the following configuration property.
gg.handler.name.format.wrapMessageInGenericAvroMessage=true
The generic message is Avro message wrapping the Avro payload message that is common to all Avro messages that are output. The schema for the generic message is name generic_wrapper.avsc
and is written to the output schema directory. This message has the following three fields:
-
table_name
:The fully qualified source table name. -
schema_fingerprint
: The fingerprint of the Avro schema of the wrapped message. The fingerprint is generated using the AvroSchemaNormalization.parsingFingerprint64(schema)
call. -
payload
: The wrapped Avro message.
The following is the Avro Formatter generic wrapper schema.
{ "type" : "record", "name" : "generic_wrapper", "namespace" : "oracle.goldengate", "fields" : [ { "name" : "table_name", "type" : "string" }, { "name" : "schema_fingerprint", "type" : "long" }, { "name" : "payload", "type" : "bytes" } ] }
Parent topic: Special Considerations
9.2.35.4.2.2 The Avro Operation Formatter
The Avro Operation Formatter formats operation data from the source trail file into messages in an Avro binary array format. Each individual insert, update, delete, and truncate operation is formatted into an individual Avro message. The source trail file contains the before and after images of the operation data. The Avro Operation Formatter formats this data into an Avro binary representation of the operation data.
This format is more verbose than the output of the Avro Row Formatter for which the Avro messages model the row data.
- Operation Metadata Formatting Details
- Operation Data Formatting Details
- Sample Avro Operation Messages
- Avro Schema
- Avro Operation Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Special Considerations
Parent topic: Using the Avro Formatter
9.2.35.4.2.2.1 Operation Metadata Formatting Details
To output the metacolumns configure the following:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
For more information see the configuration property:
gg.handler.name.format.metaColumnsTemplate
Table 9-49 Avro Messages and its Metadata
Fields | Description |
---|---|
|
The fully qualified table name, in the
format:
|
|
The type of database operation from the source trail
file. Default values are |
|
The timestamp of the operation from the source trail file. Since this timestamp is from the source trail, it is fixed. Replaying the trail file results in the same timestamp for the same operation. |
|
The time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will not result in the same timestamp for the same operation. |
|
The concatenated sequence number and rba number from the source trail file. The trail position provides traceability of the operation back to the source trail file. The sequence number is the source trail file number. The rba number is the offset in the trail file. |
|
An array variable that holds the column names of the primary keys of the source table. |
|
A map variable that holds the token key value pairs from the source trail file. |
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.2 Operation Data Formatting Details
The operation data is represented as individual fields identified by the column names.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. Avro attributes only support two states: the column has a value or the column value is null. The Avro Operation Formatter contains an additional Boolean field COLUMN_NAME
_isMissing
for each column to indicate whether the column value is missing or not. Using COLUMN_NAME
field together with the COLUMN_NAME
_isMissing
field, all three states can be defined.
-
State 1: The column has a value
COLUMN_NAME
field has a valueCOLUMN_NAME
_isMissing
field is false -
State 2: The column value is null
COLUMN_NAME
field value is nullCOLUMN_NAME
_isMissing
field is false -
State 3: The column value is missing
COLUMN_NAME
field value is nullCOLUMN_NAME
_isMissing
field is true
By default the Avro Row Formatter maps the data types from the source trail file to the associated Avro data type. Because Avro supports few data types, this functionality usually results in the mapping of numeric fields from the source trail file to members typed as numbers. You can also configure this data type mapping to handle all data as strings.
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.3 Sample Avro Operation Messages
Because Avro messages are binary, they are not human readable. The following topics show example Avro messages in JSON format:
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.3.1 Sample Insert Message
{"table": "GG.TCUSTORD", "op_type": "I", "op_ts": "2013-06-02 22:14:36.000000", "current_ts": "2015-09-18T10:17:49.570000", "pos": "00000000000000001444", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAA"}, "before": null, "after": { "CUST_CODE": "WILL", "CUST_CODE_isMissing": false, "ORDER_DATE": "1994-09-30:15:33:00", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "CAR", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "144", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": 17520.0, "PRODUCT_PRICE_isMissing": false, "PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, "TRANSACTION_ID": "100", "TRANSACTION_ID_isMissing": false}}
Parent topic: Sample Avro Operation Messages
9.2.35.4.2.2.3.2 Sample Update Message
{"table": "GG.TCUSTORD", "op_type": "U", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:17:49.880000", "pos": "00000000000000002891", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqLzAAA"}, "before": { "CUST_CODE": "BILL", "CUST_CODE_isMissing": false, "ORDER_DATE": "1995-12-31:15:00:00", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "CAR", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "765", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": 15000.0, "PRODUCT_PRICE_isMissing": false, "PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, "TRANSACTION_ID": "100", "TRANSACTION_ID_isMissing": false}, "after": { "CUST_CODE": "BILL", "CUST_CODE_isMissing": false, "ORDER_DATE": "1995-12-31:15:00:00", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "CAR", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "765", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": 14000.0, "PRODUCT_PRICE_isMissing": false, "PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, "TRANSACTION_ID": "100", "TRANSACTION_ID_isMissing": false}}
Parent topic: Sample Avro Operation Messages
9.2.35.4.2.2.3.3 Sample Delete Message
{"table": "GG.TCUSTORD", "op_type": "D", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:17:49.899000", "pos": "00000000000000004338", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"L": "206080450", "6": "9.0.80330", "R": "AADPkvAAEAAEqLzAAC"}, "before": { "CUST_CODE": "DAVE", "CUST_CODE_isMissing": false, "ORDER_DATE": "1993-11-03:07:51:35", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "PLANE", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "600", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": null, "PRODUCT_PRICE_isMissing": true, "PRODUCT_AMOUNT": null, "PRODUCT_AMOUNT_isMissing": true, "TRANSACTION_ID": null, "TRANSACTION_ID_isMissing": true}, "after": null}
Parent topic: Sample Avro Operation Messages
9.2.35.4.2.2.3.4 Sample Truncate Message
{"table": "GG.TCUSTORD", "op_type": "T", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:17:49.900000", "pos": "00000000000000004515", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAB"}, "before": null, "after": null}
Parent topic: Sample Avro Operation Messages
9.2.35.4.2.2.4 Avro Schema
Avro schemas are represented as JSONs. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages.Avro schemas are generated on a just-in-time basis when the first operation for a table is encountered. Because Avro schemas are specific to a table definition, a separate Avro schema is generated for every table encountered for processed operations. By default, Avro schemas are written to the GoldenGate_Home
/dirdef
directory, although the write location is configurable. Avro schema file names adhere to the following naming convention: Fully_Qualified_Table_Name.avsc
.
The following is a sample Avro schema for the Avro Operation Format for the samples in the preceding sections:
{ "type" : "record", "name" : "TCUSTORD", "namespace" : "GG", "fields" : [ { "name" : "table", "type" : "string" }, { "name" : "op_type", "type" : "string" }, { "name" : "op_ts", "type" : "string" }, { "name" : "current_ts", "type" : "string" }, { "name" : "pos", "type" : "string" }, { "name" : "primary_keys", "type" : { "type" : "array", "items" : "string" } }, { "name" : "tokens", "type" : { "type" : "map", "values" : "string" }, "default" : { } }, { "name" : "before", "type" : [ "null", { "type" : "record", "name" : "columns", "fields" : [ { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "CUST_CODE_isMissing", "type" : "boolean" }, { "name" : "ORDER_DATE", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_DATE_isMissing", "type" : "boolean" }, { "name" : "PRODUCT_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "PRODUCT_CODE_isMissing", "type" : "boolean" }, { "name" : "ORDER_ID", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_ID_isMissing", "type" : "boolean" }, { "name" : "PRODUCT_PRICE", "type" : [ "null", "double" ], "default" : null }, { "name" : "PRODUCT_PRICE_isMissing", "type" : "boolean" }, { "name" : "PRODUCT_AMOUNT", "type" : [ "null", "double" ], "default" : null }, { "name" : "PRODUCT_AMOUNT_isMissing", "type" : "boolean" }, { "name" : "TRANSACTION_ID", "type" : [ "null", "string" ], "default" : null }, { "name" : "TRANSACTION_ID_isMissing", "type" : "boolean" } ] } ], "default" : null }, { "name" : "after", "type" : [ "null", "columns" ], "default" : null } ] }
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.5 Avro Operation Formatter Configuration Properties
Table 9-50 Configuration Properties
Properties | Optional Y/N | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java |
UTF-8 (the JSON default) |
Controls the output encoding of generated JSON Avro schema. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding. |
|
Optional |
|
|
Controls the output typing of generated Avro messages. If set to |
|
Optional |
Any string |
no value |
Inserts delimiter after each Avro message. This is not a best practice, but in certain cases you may want to parse a stream of data and extract individual Avro messages from the stream, use this property to help. Select a unique delimiter that cannot occur in any Avro message. This property supports |
|
Optional |
Any legal, existing file system path. |
|
The output location of generated Avro schemas. |
|
Optional |
|
|
Wraps Avro messages for operations from the source trail file in a generic Avro wrapper message. For more information, see Generic Wrapper Functionality. |
|
Optional |
|
|
The format of the current timestamp. By default the ISO 8601 is set to false , removes the T between the date and time in the current timestamp, which outputs a space instead.
|
|
Optional |
|
|
Set to |
|
Optional |
Any integer value from 0 to 38. |
None |
Allows you to set the scale on the Avro |
gg.handler.name.format.mapOracleNumbersAsStrings |
Optional |
|
|
This property is only applicable if decimal logical
types are enabled via the property
gg.handler.name.format.enableDecimalLogialType=true .
Oracle numbers are especially problematic because they have a large
precision (168) and floating scale of up to 38. Some analytical tools,
such as Spark cannot read numbers that large. This property allows you
to map those Oracle numbers as strings while still mapping the smaller
numbers as decimal logical types.
|
|
Optional |
|
|
Set to |
|
Optional |
|
|
Enables the use of Avro |
gg.handler.name.format.mapLargeNumbersAsStrings |
Optional |
|
false
|
Oracle GoldenGate supports the floating point and
integer source datatypes. Some of these datatypes may not fit into the
Avro primitive double or long datatypes. Set this property to
true to map the fields that do not fit into the
Avro primitive double or long datatypes to Avro string.
|
gg.handler.name.format.metaColumnsTemplate |
Optional | See Metacolumn Keywords | None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey |
truncateOpKey | includeTableName | includeOpTimestamp |
includeOpType | includePosition | includeCurrentTimestamp,
useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. For more information about the Metacolumn keywords, see Metacolumn Keywords. |
gg.handler.name.format.maxPrecision |
Optional | None | Positive Integer | Allows you to set the maximum precision for Avro decimal
logical types. Consuming applications may have limitations on Avro
precision (that is, Apache Spark supports a maximum precision of
38).
WARNING: Configuration of this property is not without risk. |
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.6 Review a Sample Configuration
The following is a sample configuration for the Avro Operation Formatter in the Java Adapter properg.handlerties
file:
gg.handler.hdfs.format=avro_op gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.encoding=UTF-8 gg.handler.hdfs.format.wrapMessageInGenericAvroMessage=false
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.7 Metadata Change Events
If the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events, the Avro Operation Formatter can take action when metadata changes. Because Avro messages depend closely on their corresponding schema, metadata changes are important when you use Avro formatting.
An updated Avro schema is generated as soon as a table operation occurs after a metadata change event.
You must understand the impact of a metadata change event and change downstream targets to the new Avro schema. The tight dependency of Avro messages to Avro schemas may result in compatibility issues. Avro messages generated before the schema change may not be able to be deserialized with the newly generated Avro schema. Conversely, Avro messages generated after the schema change may not be able to be deserialized with the previous Avro schema. It is a best practice to use the same version of the Avro schema that was used to generate the message
For more information, consult the Apache Avro documentation.
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.8 Special Considerations
This section describes these special considerations:
Parent topic: The Avro Operation Formatter
9.2.35.4.2.2.8.1 Troubleshooting
Because Avro is a binary format, it is not human readable. However, when the log4j
Java logging level is set to TRACE
, Avro messages are deserialized and displayed in the log file as a JSON object, letting you view the structure and contents of the created Avro messages. Do not enable TRACE
in a production environment, as it has a substantial impact on performance.
Parent topic: Special Considerations
9.2.35.4.2.2.8.2 Primary Key Updates
The Avro Operation Formatter creates messages with complete data of before-image and after-images for update operations. Therefore, the Avro Operation Formatter requires no special treatment for primary key updates.
Parent topic: Special Considerations
9.2.35.4.2.2.8.3 Generic Wrapper Message
Because Avro messages are not self describing, the receiver of the message must know the schema associated with the message before the message can be deserialized. Avro messages are binary and provide no consistent or reliable way to inspect the message contents in order to ascertain the message type. Therefore, Avro can be troublesome when messages are interlaced into a single stream of data such as Kafka.
The Avro formatter provides a special feature to wrap the Avro message in a generic Avro message. You can enable this functionality by setting the following configuration property:
gg.handler.name.format.wrapMessageInGenericAvroMessage=true
The generic message is Avro message wrapping the Avro payload message that is common to all Avro messages that are output. The schema for the generic message is name generic_wrapper.avsc
and is written to the output schema directory. This message has the following three fields:
-
table_name
: The fully qualified source table name. -
schema_fingerprint
: The fingerprint of the of the Avro schema generating the messages. The fingerprint is generated using theparsingFingerprint64(Schema s)
method on theorg.apache.avro.SchemaNormalization
class. -
payload
: The wrapped Avro message.
The following is the Avro Formatter generic wrapper schema:
{ "type" : "record", "name" : "generic_wrapper", "namespace" : "oracle.goldengate", "fields" : [ { "name" : "table_name", "type" : "string" }, { "name" : "schema_fingerprint", "type" : "long" }, { "name" : "payload", "type" : "bytes" } ] }
Parent topic: Special Considerations
9.2.35.4.2.3 Avro Object Container File Formatter
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) can write to HDFS in Avro Object Container File (OCF) format. Avro OCF handles schema evolution more efficiently than other formats. The Avro OCF Formatter also supports compression and decompression to allow more efficient use of disk space.
The HDFS Handler integrates with the Avro formatters to write files to HDFS in Avro OCF format. The Avro OCF format is required for Hive to read Avro data in HDFS. The Avro OCF format is detailed in the Avro specification, see http://avro.apache.org/docs/current/spec.html#Object+Container+Files.
You can configure the HDFS Handler to stream data in Avro OCF format, generate table definitions in Hive, and update table definitions in Hive in the case of a metadata change event.
9.2.35.4.2.3.1 Avro OCF Formatter Configuration Properties
Properties | Optional / Required | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be truncated into the output record to indicate a truncate operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java. |
|
Controls the output encoding of generated JSON Avro schema. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding. |
|
Optional |
|
|
Controls the output typing of generated Avro messages. When the setting is |
|
Optional |
|
|
Controls how the formatter should handle update operations that change a primary key. Primary key operations can be problematic for the Avro Row formatter and require special consideration by you.
|
|
Optional |
|
|
Because schemas must be generated for Avro serialization to |
|
Optional |
Any legal, existing file system path |
|
The directory where generated Avro schemas are saved to the local file system. This property does not control where the Avro schema is written to in HDFS; that is controlled by an HDFS Handler property. |
|
Optional |
|
|
By default, the value of this property is true, and the format for the current timestamp is ISO8601. Set to |
|
Optional |
|
|
If set to
|
Parent topic: Avro Object Container File Formatter
9.2.35.4.3 Cloud Event Formatter
- Using the Cloud Event Formatter
- Operation Data Formatting Details
- Row Data Formatting Details
- Updates to Schema Attributes for the Cloud Event Formatter
- Sample Configuration
- Cloud Event Formatter Configuration Properties
Parent topic: Pluggable Formatters
9.2.35.4.3.1 Using the Cloud Event Formatter
CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems. CloudEvents format defines the list of attributes to describe the event, essentially an envelope with a set of mandatory and optional attributes.
Currently, CloudEvents format is limited to the JSON event format.
When CloudEvents format is enabled in the Oracle
GoldenGate for Distributed Applications and Analytics (GG for DAA) targets, the
final JSON records will look like as follows where data
field
contains the original data records.
For Insert:
{"specversion":"1.0","id":"OGG-00000000000000004120","source":"uri:slcaa318::oldhome:qastaf:ggs_home:yuga:ora:v122_rc3:ETEST","type":"QASOURCE.TCUSTORD","datacontenttype":"application/json","data":{"table":"QASOURCE.TCUSTORD","op_type":"I","op_ts":"2015-11-05
18:45:39.000000","current_ts":"2024-07-30
05:52:26.030000","pos":"00000000000000004120","after":{"CUST_CODE":"DAVE","ORDER_DATE":"1993-11-03
07:51:35","PRODUCT_CODE":"PLANE","ORDER_ID":600,"PRODUCT_PRICE":135000.00,"PRODUCT_AMOUNT":2,"TRANSACTION_ID":200}}}
For Updates:
{"specversion":"1.0","id":"OGG-00000000000000005100","source":"uri:slcaa318::oldhome:qastaf:ggs_home:yuga:ora:v122_rc3:ETEST","type":"com.oracle.goldengate","datacontenttype":"application/json","data":{"before":{"CUST_CODE":"ANN","NAME":"ANN'S
BOATS","CITY":"SEATTLE","STATE":"WA"},"after":{"CUST_CODE":"ANN","CITY":"NEW
YORK","STATE":"NY"}}}
For Deletes:
{"specversion":"1.0","id":"OGG-00000000000000005272","source":"uri:slcaa318::oldhome:qastaf:ggs_home:yuga:ora:v122_rc3:ETEST","type":"QASOURCE.TCUSTORD","datacontenttype":"application/json","data":{"table":"QASOURCE.TCUSTORD","op_type":"D","op_ts":"2015-11-05
18:45:39.000000","current_ts":"2024-07-30
05:52:26.052000","pos":"00000000000000005272","before":{"CUST_CODE":"DAVE","ORDER_DATE":"1993-11-03
07:51:35","PRODUCT_CODE":"PLANE","ORDER_ID":600,"PRODUCT_PRICE":135000.00,"PRODUCT_AMOUNT":2,"TRANSACTION_ID":200}}}
CloudEvents formatter mandates the following four attributes for each event record.
"id", "source",
"specversion", "type"
specversion
This is the non-empty Cloud Events specification version string, and it is “1.0” in the current release.
For example: "specversion":"1.0"
id
This is the unique identifier for the event records. It is defined as a non-empty string. By default, the position of the record prefixed with “OGG-” is used for event record ID.
For example: "id":"OGG-00000000000000005100"
This can be overridden by using the property idmappingtemplate.
For example:
gg.handler.kafkahandler.format.idmappingtemplate=OGG-${position}
type
This is the type of the
event related to the originating occurrence. Defaulting to
com.oracle.goldengate
for GG for DAA and can be overridden by
using the property typemappingtemplate
.
For example:
gg.handler.kafkahandler.format.typemappingtemplate=${tablename}
source
The
source
field is the context in which the occurrence happened,
and it is in the format of URI-Reference. The source trail producer obtained from
the source trail header file is used as this field value.
For
example:
"source":"uri:slcaa318::oldhome:qastaf:ggs_home:yuga:ora:v122_rc3:ETEST"
This can be overridden by using the property
sourcemappingtemplate
.
gg.handler.kafkahandler.format.sourcemappingtemplate=${tablename}
data
The data
field contains the original event data. It can only be in JSON
object
type. Defaulted to represent the data it in Operation
Data format.
Additional attributes:
Additional attributes can be sent upon the configuration of the specified explicit properties.
Following are the additional attributes.
Datacontenttype:
The valid value for now is
“application/json”
.
Subject:
Subject of the event in the context of the event producer. Can be configured using subjectMappingTemplate property.
For example:
gg.handler.kafkahandler.format.subjectMappingTemplate=OGGDAA-KafkaStreams
Time
Timestamp of when the occurrence happened. Can be controlled by template keywords.
For
example:
TimeMappingTemplategg.handler.kafkahandler.format.timeMappingTemplate
=${opTimestamp}
Parent topic: Cloud Event Formatter
9.2.35.4.3.2 Operation Data Formatting Details
The Cloud Event formatter represents the event data in Operation Data Formatting where the data is represented by before and after members that are objects. These objects contain members whose keys are the column names and whose values are the column values.
Operation data is modeled as follows:
- Inserts: Includes the after-image data.
- Updates: Includes both the before-image and the after-image data.
- Deletes: Includes the before-image data.
For example:
{"specversion":"1.0","id":"OGG-00000000000000005100","source":"uri:slcaa318::oldhome:qastaf:ggs_home:yuga:ora:v122_rc3:ETEST","type":"com.oracle.goldengate","datacontenttype":"application/json","data":{"before":{"CUST_CODE":"ANN","NAME":"ANN'S
BOATS","CITY":"SEATTLE","STATE":"WA"},"after":{"CUST_CODE":"ANN","CITY":"NEW
YORK","STATE":"NY"}}}
Parent topic: Cloud Event Formatter
9.2.35.4.3.3 Row Data Formatting Details
Row data is modeled as follows:
- Inserts: Includes the after-image data.
- Updates: Includes the after-image data.
- Deletes: Includes the before-image data.
{"specversion":"1.0","id":"OGG-00000000000000005100","source":"uri:slcaa318::oldhome:qastaf:ggs_home:yuga:ora:v122_rc3:ETEST","type":"com.oracle.goldengate","datacontenttype":"application/json","data:{"CUST_CODE":"ANN","CITY":"NEW
YORK","STATE":"NY"}}
Parent topic: Cloud Event Formatter
9.2.35.4.3.4 Updates to Schema Attributes for the Cloud Event Formatter
An Avro schema for a designated table is published the first time an operation
pertaining to that table is detected. Within the generated schema, the title and id
attributes play a crucial role in identifying its connection to the Cloud Event
formatter. The title attribute denotes the specific table, whereas the id attribute
features the fully qualified table name, prefixed with
ogg:dataStream:DMLRecord:
.
{ "$schema": "http://json-schema.org/draft-04/schema#", "title": "Oracle GoldenGate JSON Schema for DML Record from QASOURCE.TCUSTMER", "description": "JSON schema for table QASOURCE.TCUSTMER", "id": "ogg:dataStream:DMLRecord:QASOURCE.TCUSTMER", "definitions": { "row": { "type": "object", "properties": { "CUST_CODE": { "type": [ "string", "null" ] }, "NAME": { "type": [ "string", "null" ] }, "CITY": { "type": [ "string", "null" ] }, "STATE": { "type": [ "string", "null" ] } }, "additionalProperties": false }, "tokens": { "type": "object", "description": "Token keys and values are free form key value pairs.", "properties": { }, "additionalProperties": true } }, "type": "object", "properties": { "table": { "description": "The MetaColumn table", "type": "string" }, "op_type": { "description": "The MetaColumn op_type", "type": "string" }, "op_ts": { "description": "The MetaColumn op_ts", "type": "string" }, "current_ts": { "description": "The MetaColumn current_ts", "type": "string" }, "pos": { "description": "The MetaColumn pos", "type": "string" }, "before": { "$ref": "#/definitions/row" }, "after": { "$ref": "#/definitions/row" } }, "required": [ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties": false }
Parent topic: Cloud Event Formatter
9.2.35.4.3.5 Sample Configuration
gg.handler.kafkahandler.format=json_cloudevent gg.handler.kafkahandler.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]} gg.handler.kafkahandler.format.idmappingtemplate=OGG-${position} gg.handler.kafkahandler.format.typemappingtemplate=${fullyQualifiedTableName} gg.handler.kafkahandler.format.sourcemappingtemplate=${fullyQualifiedTableName} gg.handler.kafkahandler.format.subjectmappingtemplate=From OGGDAA gg.handler.kafkahandler.format.timemappingtemplate=${opTimestamp} gg.handler.kafkahandler.format.dataContentType=application/json gg.handler.kafkahandler.format.specversion=V1
Parent topic: Cloud Event Formatter
9.2.35.4.3.6 Cloud Event Formatter Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format |
Optional | json_cloudevent
|
None | Controls whether the generated Cloud Event’s JSON
output messages are operation modeled or row modeled. Set to json
for operation modeled orjson_row for row
modeled.
|
gg.handler.name.format.idmappingtemplate |
Optional | See Template Keywords | OGG-${position} |
This is the unique identifier for the event records.
It is defined as a non-empty string. By default, the position of
the record prefixed with This can be overridden by using template keywords. |
gg.handler.name.format.typemappingtemplate |
Optional | See Template Keywords | com.oracle.goldengate |
This is the type of the event related to the
originating occurrence. Defaulting to
com.oracle.goldengate for Oracle GoldenGate for
Distributed Applications and Analytics (GG for DAA) and can be
overridden by using Template Keywords.
|
gg.handler.name.format.sourcemappingtemplate |
optional | See Template Keywords | Trail header file value. | The "source" field is the context in which the occurrence happened, and it is in the format of URI-Reference. The source trail producer obtained from the source trail header file is used as this field value. can be overridden by using Template Keywords. |
gg.handler.name.format.subjectmappingtemplate |
Optional | See Template Keywords | None |
Subject of the event in the context of the event producer. Can be configured using using Template Keywords. |
gg.handler.name.format.timemappingtemplate |
Optional | See Template Keywords | None |
Timestamp of when the occurrence happened. Can be controlled by Template Keywords. |
gg.handler.name.format.dataContentType |
Optional | application/json |
application/json |
ContentType of the data. |
gg.handler.name.format.specversion |
Optional | Valid SpecVersion supported by CloudEvents. | V1 | Cloud Events specification version. |
gg.handler.name.format.metaColumnsTemplate |
Optional | See Metacolumn Keywords. | None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey | truncateOpKey | includeTableName | includeOpTimestamp | includeOpType | includePosition | includeCurrentTimestamp, useIso8601Format It is a comma-delimited string consisting of one or more
templated values that represent the template. For more
information about the Metacolumn keywords, see Metacolumn Keywords.
This is an example that would produce a list of metacolumns:
|
Parent topic: Cloud Event Formatter
9.2.35.4.4 Existing Avro Formatter
The existing Avro formatter allows users to map trail records into their existing Avro schemas.
- Prerequisites
- Supported Output Formats
- Support for Mapping to Nested Avro Fields
- Support for Mapping to Nested Avro Fields
- Configuration
- Unsupported Features
Parent topic: Pluggable Formatters
9.2.35.4.4.1 Prerequisites
gg.mdp.type=avro
gg.mdp.schemaFilesPath=
Parent topic: Existing Avro Formatter
9.2.35.4.4.2 Supported Output Formats
The Existing Avro Schema formatter can generate messages in following formats:
- Raw avro – Generally used with Kafka or other integrations when sending individual messages.
- Avro OCF (object container file) – Generally used when generating files such as with the File Writer Handler. This is the format which has the Avro schema in the file header and then contains one or more records which conform to the schema.
- JSON – It is possible to convert the Avro Generic Record to JSON and publish the JSON. This can be used for debugging since JSON provides human readability whereas Avro does not. Additionally, it could be used if you wish to send data in JSON formats which support nested hierarchical structures.
Parent topic: Existing Avro Formatter
9.2.35.4.4.3 Support for Mapping to Nested Avro Fields
The Existing Avro Schema Formatter supports mapping into Avro hierarchical structures. This is one of the powerful features of the Existing Avro Schema formatter. As previously stated, the GoldenGate trail structure is a flat structure of key-value pairs which model the flat column name and column value structure of an RDBMS. Avro can be a hierarchical structure where records can be nested inside of fields and records. The Avro Metadata provider has been modified to traverse the Avro schema and to flatten the Avro structure to return the target metadata to the GoldenGate mapping.
{ "type" : "record", "name" : "TCUSTMERNEST", "namespace" : "QASOURCE", "fields" : [ { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null, "primary_key": true }, { "name" : "SUB", "type" : { "type" : "record", "name" : "subrecord", "fields" : [ { "name" : "NAME", "type" : [ "null", "string" ], "default" : null }, { "name" : "STATE", "type" : [ "null", "string" ], "default" : null }, { "name" : "CITY", "type" : [ "null", "string" ], "default" : null } ] } } ] }Field
CUST_CODE
is at the root record level. However, fields
NAME
, STATE
, and CITY
are
fields nesting inside of field SUB which is at the root record level. The meta
provider must flatten the Avro structure and return it to GoldenGate so that values
can be mapped into the Avro hierarchical structure. The “-“
(dash)
character is used to designate or delimit levels of the hierarchical structure. Avro
does not support the “-“ character in field names. Avro field names only support
upper case A to Z, lower case a to z, numbers, and the “_”
(underbar) character. Therefore the “-“
does not conflict with Avro
name. The Avro metadata provider returns the following fields for the above
schema.CUST_CODE
SUB-NAME
SUB-STATE
SUB-CITY
MAP QASOURCE.TCUSTMER, TARGET QASOURCE.TCUSTMERNEST COLMAP (CUST_CODE=CUST_CODE, SUB-NAME=NAME, SUB-CITY=CITY, SUB-STATE=STATE);
Avro has no restriction how deep fields can be nested. It can be confusing to
customers who are looking at an Avro schema to understand what the flattened
structure looks like. To help the flattened structure is logged to the
<replicat name>.log
file. Therefore, if you run the replicat, then
you can examine the log file to see the flattened fields.
INFO 2023-08-09 14:54:42.000242 [main] - Retrieved Avro metadata for table [QASOURCE.TCUSTMERNEST] Field name [CUST_CODE] Type [STRING] Nullable [true]. Field name [SUB-NAME] Type [STRING] Nullable [true]. Field name [SUB-STATE] Type [STRING] Nullable [true]. Field name [SUB-CITY] Type [STRING] Nullable [true]. End retrieved metadata.
Parent topic: Existing Avro Formatter
9.2.35.4.4.4 Support for Mapping to Nested Avro Fields
- Column has a value.
- Column value is null.
- Column value is missing.
- If the mapped Avro field is nullable, then null will be mapped to the target field.
- If the mapped Avro field is not nullable, then the default value for the field will be mapped as defined in the Avro schema.
- If the target field is not nullable and provides no default value, then this is an exception. The replicat will log an exception and abend.
- If a default value is defined in the Avro schema then the default value will be mapped into the target field.
- If no default value is defined in the Avro schema, and the field is nullable, then null will be mapped into the target field.
- If the target field is not nullable and provides no default value, then this is an exception. The replicat will log an exception and abend.
Additionally, it is important to understand that the default value for a field may be null.
Examples of Avro field definitions:
{"name": "STRING_REQUIRED", "type": "string"}
{"name": "STRING_REQUIRED", "type": "string", "default": "mydefault"}
{"name": "STRING_NULLABLE", "type": ["null","string"]}
{"name": "STRING_NULLABLE", "type": ["null","string"], "default": null}
String field nullable with non-null default.
{"name": "STRING_NULLABLE", "type": ["string","null"], "default": "mydefault"}
Note, it is an Avro requirement that default value apply to the first type in the type union. None of the of the following are legal. These will fail when the Avro library parses the schema and replicat will abend.
{"name": "STRING_NULLABLE", "type": ["string","null"], "default": null}
{"name": "STRING_NULLABLE", "type": ["null","string"], "default": "mydefault"}
{"name": "STRING_REQUIRED", "type": "string", "default": null}
Parent topic: Existing Avro Formatter
9.2.35.4.4.5 Configuration
Table 9-51 Configuration Properties
Configuration Properties | Optional/Required | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format |
true |
existing_avro_schema | existing_avro_schema_ocf |
existing_avro_schema_json |
None | |
gg.handler.name.format.fixedPaddingByte |
false | A byte in hex format. For example an ASCII space would be represented as 20. | 00 (Hex value 00 often referred to as the null byte) |
Only applicable if source schemas contain fields defined as
fixed. Avro fixed type is a fixed length binary record. If the
data is not sufficient to fill the space, then the field must be
padded with bytes. This allows the user to configure that
padding byte. So for example,
|
gg.handler.name.format.encoding |
false | A Java legal encoding like UTF-8. | The system default system encoding. | Only applicable if
gg.handler.name.format=existing_avro_schema_json .
Used to set the encoding of generated JSON data.
|
gg.handler.name.format.localZoneId |
false | A Java legal time zone. | The system default time zone. | This is to support Avro logical types local-timestamp-millis and local-timestamp-micros. The object from OGG is a Java Instant which is in UTC. It must be converted to a LocalDateTime object which requires a time zone to convert. |
gg.handler.name.format.jsonDelimiter |
false | String | None | Only applicable if
gg.handler.name.format=existing_avro_schema_json .
Allows a delimiter to be inserted between generated JSON documents.
The most common use case is to insert a line feed character between
the JSON documents. For example:
gg.handler.name.format=CDATA[\n] CDATA[] is supported
to preserve whitespace in the configuration.
|
Parent topic: Existing Avro Formatter
9.2.35.4.4.6 Unsupported Features
Maps and Arrays
Neither the Avro Metadata Provider nor the Existing Avro schema formatter support Avro arrays or maps. Avro arrays and maps are containers, which can hold 0 to N elements. The Oracle GoldenGate trail structure is a flat structure of key and value pairs which in turn models the flat column name and column value structure of an RDBMS. Target metadata must be made to be a flat structure and the number of elements must be deterministic. Arrays and maps are problematic because they can contain 0 to N elements. Different messages can contain different numbers of elements for the collections.
Recursive Record Definitions
Avro supports recursive record definitions. A recursive record definition is where an element in record A contains a field which references record B. The definition of record B contains a field which references record A. The following is an example of a recursive record definition.
{ "type" : "record", "name" : "TCUSTMERINVALID", "namespace" : "QASOURCE", "fields" : [ { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null, "primary_key": true }, { "name" : "SUB1", "type" : [ "null", { "type" : "record", "name" : "subrecord", "fields" : [ { "name" : "NAME", "type" : [ "null", "string" ], "default" : null }, { "name" : "SUB2", "type" : { "type" : "record", "name" : "subrecord2", "fields" : [ { "name" : "RECUSIVEELEMENT", "type" : [ "null", "subrecord" ], "default" : null } ] } } ] } ], "default" : null }, { "name" : "STATE", "type" : [ "null", "string" ], "default" : null } ] }
SUB1
is a record of type subrecord
. This
record contains field SUB2
which is of type
subrecord2
. Type subrecord 2 contains a field
RECURSIVEELEMENT
, which is of type subrecord
.
While this is a legal Avro schema it is problematic for the Avro Metadata Provider
to process. The structure is recursive and therefore, the schema definition is by
default infinitely deep. The Avro Metadata Provider must flatten the Avro schema
into a deterministic flat structure of element names. The recursive nature of the
schema definition prevents the Avro metadata provider from being able to generate a
deterministic flat structure of element names. Parsing of Avro schemas with a
recursive definition results in the following run time
exception:ERROR 2023-08-09 15:16:19.000560 [main] - The Avro schema for table [QASOURCE.TCUSTMERINVALID] encountered a recursive reference to a record schema. The recursively referenced record schema is namespace [QASOURCE] name [subrecord2]. Schemas with recursively referenced record schemas are not supported. ERROR 2023-08-09 15:16:19.000563 [main] - AVRO-00005 Unable to retrieve table matadata table : [QASOURCE.TCUSTMERINVALID]. oracle.goldengate.datasource.metadata.provider.ResolutionException: The Avro schema for table [QASOURCE.TCUSTMERINVALID] encountered a recursive reference to a record schema. The recursively referenced record schema is namespace [QASOURCE] name [subrecord2]. Schemas with recursively referenced record schemas are not supported.
Non-discrete Unions
The Avro MDP only supports Unions of null and another type. For example:
"type" : [ "null", "string" ],
The following is an Avro union type of
double and string. This is legal is Avro; however, it cannot be supported in the Oracle
GoldenGate for Distributed Applications and Analytics (GG for DAA) metadata provider. The
problem is that the field needs to be assigned to a discrete type. This field type, by
definition, is not discrete. It can be either a double or a string: "type" : [
"double", "string" ],
Metacolumns Configuration
Metacolumn configuration is not supported with the existing Avro schema formatter.
The following configuration will cause the replicat process to abend:
gg.handler.name.format.metaColumnsTemplate=
The reason is that the input data from replicat is being mapped into an existing Avro schema. The result of configuring metacolumns is to add additional fields to the output data which is contrary to the idea of outputting the data which conforms to the provided Avro schema. It is possible to output operation metadata into the output records. To accomplish this the user should modify the Avro schema to add one or more fields to hold the desired operation metadata. Then the replicat mapping statement can be used to map operation metadata into the fields.
Sample Configuration
The following is example configuration of end to end configuration using the existing Avro schema formatter functionality.
Replicat .prm file:
mdp.prm
REPLICAT mdp -- Trail file for this example is located in "AdapterExamples/trail" directory -- Command to add REPLICAT -- add replicat mdp, exttrail AdapterExamples/trail/tr TARGETDB LIBFILE libggjava.so SET property=dirprm/mdp.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 1000 MAP QASOURCE.TCUSTMER, TARGET QASOURCE.TCUSTMERNEST COLMAP (SUB-NAME=NAME, SUB-CITY=CITY, SUB-STATE=STATE, CUST_CODE=CUST_CODE);
Replicat Properties File:
mdp.props
gg.handlerlist=filewriter #The File Writer Handler gg.handler.filewriter.type=filewriter gg.handler.filewriter.mode=op gg.handler.filewriter.pathMappingTemplate=./dirout gg.handler.filewriter.stateFileDirectory=./dirsta gg.handler.filewriter.fileNameMappingTemplate=${fullyQualifiedTableName}_${currentTimestamp}.txt gg.handler.filewriter.fileRollInterval=7m gg.handler.filewriter.finalizeAction=none gg.handler.filewriter.inactivityRollInterval=7m gg.handler.filewriter.format=existing_avro_schema_ocf gg.handler.filewriter.includetokens=true gg.handler.filewriter.partitionByTable=true gg.handler.filewriter.rollOnShutdown=true gg.format.timestamp=yyyy-MM-dd HH:mm:ss.SSS #Avro Metadata provider must be enabled. gg.mdp.type=avro gg.mdp.schemaFilesPath=/scratch/tbcampbe/ggwork/gg23.1.0.0.0WORKING5/avromdp
"type" : "record", "name" : "TCUSTMERNEST", "namespace" : "QASOURCE", "fields" : [ { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null, "primary_key": true }, { "name" : "SUB", "type" : { "type" : "record", "name" : "subrecord", "namespace" : "oracle.goldengate", "fields" : [ { "name" : "NAME", "type" : [ "null", "string" ], "default" : null }, { "name" : "STATE", "type" : [ "null", "string" ], "default" : null }, { "name" : "CITY", "type" : [ "null", "string" ], "default" : null } ] } } ] }
Parent topic: Existing Avro Formatter
9.2.35.4.5 Using the Delimited Text Formatter
-
Column has a value: The column value is output.
-
Column value is null: The default output value is
NULL
. The output for the case of a null column value is configurable. -
Column value is missing: The default output value is an empty string (""). The output for the case of a missing column value is configurable.
- Using the Delimited Text Row Formatter
The Delimited Text Row Formatter is the Delimited Text Formatter that was included a release prior to the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 19.1.0.0.0 release. It writes the after change data for inserts and updates, and before change data for deletes. - Delimited Text Operation Formatter
The Delimited Text Operation Formatter outputs both before and after change data for insert, update, and delete operations.
Parent topic: Pluggable Formatters
9.2.35.4.5.1 Using the Delimited Text Row Formatter
The Delimited Text Row Formatter is the Delimited Text Formatter that was included a release prior to the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 19.1.0.0.0 release. It writes the after change data for inserts and updates, and before change data for deletes.
- Message Formatting Details
- Sample Formatted Messages
- Output Format Summary Log
- Configuration
- Metadata Change Events
- Additional Considerations
Parent topic: Using the Delimited Text Formatter
9.2.35.4.5.1.1 Message Formatting Details
The automated output of meta-column fields in generated delimited text messages has been removed as of Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release 21.1. Meta-column fields can still be output; however, they need to explicitly configured as the following property:
gg.handler.name.format.metaColumnsTemplate
To output the metacolumns as in previous versions configure the following:
gg.handler.name.format.metaColumnsTemplate=${optype[op_type]},${objectname[table]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${optype[op_type]},${objectname[table]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
see the configuration property
gg.handler.name.format.metaColumnsTemplate
in the Delimited Text Formatter Configuration Properties table.
Formatting details:
-
Operation Type : Indicates the type of database operation from the source trail file. Default values are
I
for insert,U
for update,D
for delete,T
for truncate. Output of this field is suppressible. -
Fully Qualified Table Name: The fully qualified table name is the source database table including the catalog name, and the schema name. The format of the fully qualified table name is
catalog_name.schema_name.table_name
. The output of this field is suppressible. -
Operation Timestamp : The commit record timestamp from the source system. All operations in a transaction (unbatched transaction) will have the same operation timestamp. This timestamp is fixed, and the operation timestamp is the same if the trail file is replayed. The output of this field is suppressible.
-
Current Timestamp : The timestamp of the current time when the delimited text formatter processes the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file does not result in the same timestamp for the same operation. The output of this field is suppressible.
-
Trail Position :The concatenated sequence number and RBA number from the source trail file. The trail position lets you trace the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. The output of this field is suppressible.
-
Tokens : The token key value pairs from the source trail file. The output of this field in the delimited text output is suppressed unless the
includeTokens
configuration property on the corresponding handler is explicitly set totrue
.
Parent topic: Using the Delimited Text Row Formatter
9.2.35.4.5.1.2 Sample Formatted Messages
The following sections contain sample messages from the Delimited Text Formatter. The default field delimiter has been changed to a pipe character, |
, to more clearly display the message.
Parent topic: Using the Delimited Text Row Formatter
9.2.35.4.5.1.2.1 Sample Insert Message
I|GG.TCUSTORD|2013-06-02 22:14:36.000000|2015-09-18T13:23:01.612001|00000000000000001444|R=AADPkvAAEAAEqL2A AA|WILL|1994-09-30:15:33:00|CAR|144|17520.00|3|100
Parent topic: Sample Formatted Messages
9.2.35.4.5.1.2.2 Sample Update Message
U|GG.TCUSTORD|2013-06-02 22:14:41.000000|2015-09-18T13:23:01.987000|00000000000000002891|R=AADPkvAAEAAEqLzA AA|BILL|1995-12-31:15:00:00|CAR|765|14000.00|3|100
Parent topic: Sample Formatted Messages
9.2.35.4.5.1.2.3 Sample Delete Message
D,GG.TCUSTORD,2013-06-02 22:14:41.000000,2015-09-18T13:23:02.000000,00000000000000004338,L=206080450,6=9.0. 80330,R=AADPkvAAEAAEqLzAAC,DAVE,1993-11-03:07:51:35,PLANE,600,,,
Parent topic: Sample Formatted Messages
9.2.35.4.5.1.2.4 Sample Truncate Message
T|GG.TCUSTORD|2013-06-02 22:14:41.000000|2015-09-18T13:23:02.001000|00000000000000004515|R=AADPkvAAEAAEqL2A AB|||||||
Parent topic: Sample Formatted Messages
9.2.35.4.5.1.3 Output Format Summary Log
If INFO
level logging is enabled, the Java log4j logging logs a
summary of the delimited text output format . A summary of the delimited fields is
logged for each source table encountered and occurs when the first operation for
that table is received by the Delimited Text formatter. This detailed explanation of
the fields of the delimited text output may be useful when you perform an initial
setup. When a metadata change event occurs, the summary of the delimited fields is
regenerated and logged again at the first subsequent operation for that table.
Parent topic: Using the Delimited Text Row Formatter
9.2.35.4.5.1.4 Configuration
9.2.35.4.5.1.4.1 Review a Sample Configuration
The following is a sample configuration for the Delimited Text formatter in the Java Adapter configuration file:
gg.handler.name.format.includeColumnNames=false gg.handler.name.format.insertOpKey=I gg.handler.name.format.updateOpKey=U gg.handler.name.format.deleteOpKey=D gg.handler.name.format.truncateOpKey=T gg.handler.name.format.encoding=UTF-8 gg.handler.name.format.fieldDelimiter=CDATA[\u0001] gg.handler.name.format.lineDelimiter=CDATA[\n] gg.handler.name.format.keyValueDelimiter=CDATA[=] gg.handler.name.format.kevValuePairDelimiter=CDATA[,] gg.handler.name.format.pkUpdateHandling=abend gg.handler.name.format.nullValueRepresentation=NULL gg.handler.name.format.missingValueRepresentation=CDATA[] gg.handler.name.format.includeGroupCols=false gg.handler.name.format=delimitedtext
Parent topic: Configuration
9.2.35.4.5.1.5 Metadata Change Events
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) now handles metadata change events at runtime. This assumes that the replicated database and upstream replication processes are propagating metadata change events. The Delimited Text Formatter changes the output format to accommodate the change and the Delimited Text Formatter continue running.
Note:
A metadata change may affect downstream applications. Delimited text formats include a fixed number of fields that are positionally relevant. Deleting a column in the source table can be handled seamlessly during Oracle GoldenGate runtime, but results in a change in the total number of fields, and potentially changes the positional relevance of some fields. Adding an additional column or columns is probably the least impactful metadata change event, assuming that the new column is added to the end. Consider the impact of a metadata change event before executing the event. When metadata change events are frequent, Oracle recommends that you consider a more flexible and self-describing format, such as JSON or XML.Parent topic: Using the Delimited Text Row Formatter
9.2.35.4.5.1.6 Additional Considerations
Exercise care when you choose field and line delimiters. It is important to choose delimiter values that will not occur in the content of the data.
The Java Adapter configuration trims leading and trailing characters from configuration values when they are determined to be whitespace. However, you may want to choose field delimiters, line delimiters, null value representations, and missing value representations that include or are fully considered to be whitespace . In these cases, you must employ specialized syntax in the Java Adapter configuration file to preserve the whitespace. To preserve the whitespace, when your configuration values contain leading or trailing characters that are considered whitespace, wrap the configuration value in a CDATA[]
wrapper. For example, a configuration value of \n
should be configured as CDATA[\n]
.
You can use regular expressions to search column values then replace matches with a specified value. You can use this search and replace functionality together with the Delimited Text Formatter to ensure that there are no collisions between column value contents and field and line delimiters. For more information, see Using Regular Expression Search and Replace.
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications sore data differently from RDBMSs. Update and delete operations in an RDBMS result in a change to the existing data. However, in GG for DAA applications, data is appended instead of changed. Therefore, the current state of a given row consolidates all of the existing operations for that row in the HDFS system. This leads to some special scenarios as described in the following sections.
9.2.35.4.5.1.6.1 Primary Key Updates
In Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) integrations, primary key update operations require special consideration and planning. Primary key updates modify one or more of the primary keys for the given row from the source database. Because data is appended in GG for DAA applications, a primary key update operation looks more like an insert than an update without any special handling. You can configure how the Delimited Text formatter handles primary key updates. These are the configurable behaviors:
Table 9-52 Configurable Behavior
Value | Description |
---|---|
|
By default the delimited text formatter terminates in the case of a primary key update. |
|
The primary key update is treated like any other update operation. Use this configuration alternative only if you can guarantee that the primary key is not used as selection criteria to select row data from a GG for DAA system. |
|
The primary key update is treated as a special case of a delete, using the before-image data and an insert using the after-image data. This configuration may more accurately model the effect of a primary key update in a GG for DAA application. However, if this configuration is selected it is important to have full supplemental logging enabled on replication at the source database. Without full supplemental logging, the delete operation will be correct, but the insert operation will not contain all of the data for all of the columns for a full representation of the row data in the GG for DAA application. |
Parent topic: Additional Considerations
9.2.35.4.5.1.6.2 Data Consolidation
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications append data to the underlying storage. Analytic tools generally spawn MapReduce programs that traverse the data files and consolidate all the operations for a given row into a single output. Therefore, it is important to specify the order of operations. The Delimited Text formatter provides a number of metadata fields to do this. The operation timestamp may be sufficient to fulfill this requirement. Alternatively, the current timestamp may be the best indicator of the order of operations. In this situation, the trail position can provide a tie-breaking field on the operation timestamp. Lastly, the current timestamp may provide the best indicator of order of operations in GG for DAA.
Parent topic: Additional Considerations
9.2.35.4.5.2 Delimited Text Operation Formatter
The Delimited Text Operation Formatter outputs both before and after change data for insert, update, and delete operations.
- Message Formatting Details
- Sample Formatted Messages
- Output Format Summary Log
- Delimited Text Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) now handles metadata change events at runtime. This assumes that the replicated database and upstream replication processes are propagating metadata change events. The Delimited Text Formatter changes the output format to accommodate the change and the Delimited Text Formatter continue running. - Additional Considerations
Exercise care when you choose field and line delimiters. It is important to choose delimiter values that do not occur in the content of the data.
Parent topic: Using the Delimited Text Formatter
9.2.35.4.5.2.1 Message Formatting Details
The automated output of meta-column fields in generated
delimited text messages has been removed as of Oracle GoldenGate for Distributed
Applications and Analytics (GG for DAA) release 21.1. Meta-column fields can still
be output; however, they need to explicitly configured as the following property:
gg.handler.name.format.metaColumnsTemplate
. For more
information, see the configuration property
gg.handler.name.format.metaColumnsTemplate
in the Delimited Text Formatter Configuration Properties table.
To output the metacolumns as in previous versions configure the following:
gg.handler.name.format.metaColumnsTemplate=${optype[op_type]},${objectname[table]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${optype[op_type]},${objectname[table]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
Formatting details:
-
Operation Type :Indicates the type of database operation from the source trail file. Default values are
I
for insert,U
for update,D
for delete,T
for truncate. Output of this field is suppressible. -
Fully Qualified Table Name: The fully qualified table name is the source database table including the catalog name, and the schema name. The format of the fully qualified table name is catalog_name.schema_name.table_name. The output of this field is suppressible.
-
Operation Timestamp : The commit record timestamp from the source system. All operations in a transaction (unbatched transaction) will have the same operation timestamp. This timestamp is fixed, and the operation timestamp is the same if the trail file is replayed. The output of this field is suppressible.
-
Current Timestamp : The timestamp of the current time when the delimited text formatter processes the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file does not result in the same timestamp for the same operation. The output of this field is suppressible.
-
Trail Position :The concatenated sequence number and RBA number from the source trail file. The trail position lets you trace the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. The output of this field is suppressible.
-
Tokens : The token key value pairs from the source trail file. The output of this field in the delimited text output is suppressed unless the
includeTokens
configuration property on the corresponding handler is explicitly set totrue
.
Parent topic: Delimited Text Operation Formatter
9.2.35.4.5.2.2 Sample Formatted Messages
The following sections contain sample messages from the Delimited Text Formatter. The default field delimiter has been changed to a pipe character, |
, to more clearly display the message.
Parent topic: Delimited Text Operation Formatter
9.2.35.4.5.2.2.1 Sample Insert Message
I|GG.TCUSTMER|2015-11-05
18:45:36.000000|2019-04-17T04:49:00.156000|00000000000000001956|R=AAKifQAAKAAAFDHAAA,t=,L=7824137832,6=2.3.228025||WILL||BG
SOFTWARE CO.||SEATTLE||WA
Parent topic: Sample Formatted Messages
9.2.35.4.5.2.2.2 Sample Update Message
U|QASOURCE.TCUSTMER|2015-11-05 18:45:39.000000|2019-07-16T11:54:06.008002|00000000000000005100|R=AAKifQAAKAAAFDHAAE|ANN|ANN|ANN'S BOATS||SEATTLE|NEW YORK|WA|NY
Parent topic: Sample Formatted Messages
9.2.35.4.5.2.2.3 Sample Delete Message
D|QASOURCE.TCUSTORD|2015-11-05
18:45:39.000000|2019-07-16T11:54:06.009000|00000000000000005272|L=7824137921,R=AAKifSAAKAAAMZHAAE,6=9.9.479055|DAVE||1993-11-03
07:51:35||PLANE||600||135000.00||2||200|
Parent topic: Sample Formatted Messages
9.2.35.4.5.2.2.4 Sample Truncate Message
T|QASOURCE.TCUSTMER|2015-11-05
18:45:39.000000|2019-07-16T11:54:06.004002|00000000000000003600|R=AAKifQAAKAAAFDHAAE||||||||
Parent topic: Sample Formatted Messages
9.2.35.4.5.2.3 Output Format Summary Log
If INFO
level logging is enabled, the Java log4j logging logs a
summary of the delimited text output format . A
summary of the delimited fields is logged for each
source table encountered and occurs when the first
operation for that table is received by the
Delimited Text formatter. This detailed
explanation of the fields of the delimited text
output may be useful when you perform an initial
setup. When a metadata change event occurs, the
summary of the delimited fields is regenerated and
logged again at the first subsequent operation for
that table.
Parent topic: Delimited Text Operation Formatter
9.2.35.4.5.2.4 Delimited Text Formatter Configuration Properties
Table 9-53 Delimited Text Formatter Configuration Properties
Properties | Optional / Required | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format |
Required |
delimitedtext_op |
None |
Selects the Delimited Text Operation Formatter as the formatter. |
gg.handler.name.format.includeColumnNames |
Optional |
|
false |
Controls the output of writing the column names as a
delimited field preceding the column value. When
When
|
gg.handler.name.format.disableEscaping |
Optional |
|
false | Set to true to disable the
escaping of characters which conflict with the configured delimiters.
Ensure that it is set to true if
gg.handler.name.format.fieldDelimiter is set to a
value of multiple characters.
|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any encoding name or alias supported by Java. |
The native system encoding of the machine hosting the Oracle GoldenGate process. |
Determines the encoding of the output delimited text. |
|
Optional |
Any String |
|
The delimiter used between delimited fields. This value
supports |
|
Optional |
Any String |
Newline (the default Hive delimiter) |
The delimiter used between delimited fields.
This value supports CDATA[] wrapping.
|
|
Optional |
Any string |
|
Specifies a delimiter between keys and values in a map.
Key1=value1. Tokens are mapped values. Configuration value supports
|
|
Optional |
Any string |
|
Specifies a delimiter between key value pairs in a map.
|
|
Optional |
Any string |
NULL |
Specifies what is included in the delimited output in the
case of a NULL value. Configuration value supports
|
|
Optional |
Any string |
|
Specifies what is included in the delimited text output
in the case of a missing value. Configuration value supports
|
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
gg.handler.name.format.includeGroupCols |
Optional | true | false |
false |
If set to true , the columns are grouped
into sets of all names, all before values, and all after values
U,QASOURCE.TCUSTMER,2015-11-05 18:45:39.000000,2019-04-17T05:19:30.556000,00000000000000005100,R=AAKifQAAKAAAFDHAAE,CUST_CODE,NAME,CITY,STATE,ANN,ANN'S BOATS,SEATTLE,WA,ANN,,NEW YORK,NY |
gg.handler.name.format.enableFieldDescriptorHeaders |
Optional | true | false |
false |
Set to true to add a descriptive header
to each data file for delimited text output. The header will be the
individual field names separated by the field delimiter.
|
gg.handler.name.format.metaColumnsTemplate |
Optional | See Metacolumn Keywords. | None | The current meta column information can be configured in
a simple manner and removes the explicit need to
use: It
is a comma-delimited string consisting of one or more templated values
that represent the template. For more information about the Metacolumn
keywords, see Metacolumn Keywords. This is an example that would produce a list of
metacolumns: ${optype}, ${token.ROWID}, ${sys.username},
${currenttimestamp} |
Parent topic: Delimited Text Operation Formatter
9.2.35.4.5.2.5 Review a Sample Configuration
The following is a sample configuration for the Delimited Text formatter in the Java Adapter configuration file:
gg.handler.name.format.includeColumnNames=false gg.handler.name.format.insertOpKey=I gg.handler.name.format.updateOpKey=U gg.handler.name.format.deleteOpKey=D gg.handler.name.format.truncateOpKey=T gg.handler.name.format.encoding=UTF-8 gg.handler.name.format.fieldDelimiter=CDATA[\u0001] gg.handler.name.format.lineDelimiter=CDATA[\n] gg.handler.name.format.keyValueDelimiter=CDATA[=] gg.handler.name.format.kevValuePairDelimiter=CDATA[,] gg.handler.name.format.nullValueRepresentation=NULL gg.handler.name.format.missingValueRepresentation=CDATA[] gg.handler.name.format.includeGroupCols=false gg.handler.name.format=delimitedtext_op
Parent topic: Delimited Text Operation Formatter
9.2.35.4.5.2.6 Metadata Change Events
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) now handles metadata change events at runtime. This assumes that the replicated database and upstream replication processes are propagating metadata change events. The Delimited Text Formatter changes the output format to accommodate the change and the Delimited Text Formatter continue running.
Note:
A metadata change may affect downstream applications. Delimited text formats include a fixed number of fields that are positionally relevant. Deleting a column in the source table can be handled seamlessly during Oracle GoldenGate runtime, but results in a change in the total number of fields, and potentially changes the positional relevance of some fields. Adding an additional column or columns is probably the least impactful metadata change event, assuming that the new column is added to the end. Consider the impact of a metadata change event before executing the event. When metadata change events are frequent, Oracle recommends that you consider a more flexible and self-describing format, such as JSON or XML.
Parent topic: Delimited Text Operation Formatter
9.2.35.4.5.2.7 Additional Considerations
Exercise care when you choose field and line delimiters. It is important to choose delimiter values that do not occur in the content of the data.
The Java Adapter configuration trims leading and trailing characters from
configuration values when they are determined to be whitespace. However, you may
want to choose field delimiters, line delimiters, null value representations, and
missing value representations that include or are fully considered to be whitespace
. In these cases, you must employ specialized syntax in the Java Adapter
configuration file to preserve the whitespace. To preserve the whitespace, when your
configuration values contain leading or trailing characters that are considered
whitespace, wrap the configuration value in a CDATA[]
wrapper. For
example, a configuration value of \n
should be configured as
CDATA[\n]
.
You can use regular expressions to search column values then replace matches with a specified value. You can use this search and replace functionality together with the Delimited Text Formatter to ensure that there are no collisions between column value contents and field and line delimiters. For more information, see Using Regular Expression Search and Replace.
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) applications sore data differently from RDBMSs. Update and delete operations in an RDBMS result in a change to the existing data. However, in GG for DAA, data is appended instead of changed. Therefore, the current state of a given row consolidates all of the existing operations for that row in the HDFS system. This leads to some special scenarios as described in the following sections.
Parent topic: Delimited Text Operation Formatter
9.2.35.4.6 Using the JSON Formatter
The JavaScript Object Notation (JSON) formatter can output operations from the source trail file in either row-based format or operation-based format. It formats operation data from the source trail file into a JSON objects. Each insert, update, delete, and truncate operation is formatted into an individual JSON message.
- Operation Metadata Formatting Details
- Operation Data Formatting Details
- Row Data Formatting Details
- Sample JSON Messages
- JSON Schemas
- JSON Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- JSON Primary Key Updates
- Integrating Oracle Stream Analytics
- Mongo Document Formatting Details
Parent topic: Pluggable Formatters
9.2.35.4.6.1 Operation Metadata Formatting Details
To output the metacolumns configure the following:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
For more information see the configuration property:
gg.handler.name.format.metaColumnsTemplate
.
Parent topic: Using the JSON Formatter
9.2.35.4.6.2 Operation Data Formatting Details
JSON messages begin with the operation metadata fields, which are followed by the operation data fields. This data is represented by before
and after
members that are objects. These objects contain members whose keys are the column names and whose values are the column values.
Operation data is modeled as follows:
-
Inserts: Includes the after-image data.
-
Updates: Includes both the before-image and the after-image data.
-
Deletes: Includes the before-image data.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. The JSON Formatter maps these column value states into the created JSON objects as follows:
-
The column has a value: The column value is output. In the following example, the member
STATE
has a value."after":{ "CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":"CO" }
-
The column value is null: The default output value is a JSON NULL. In the following example, the member
STATE
is null."after":{ "CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":null }
-
The column value is missing: The JSON contains no element for a missing column value. In the following example, the member
STATE
is missing."after":{ "CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", }
The default setting of the JSON Formatter is to map the data types from the source trail file to the associated JSON data type. JSON supports few data types, so this functionality usually results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping can be configured treat all data as strings.
Parent topic: Using the JSON Formatter
9.2.35.4.6.3 Row Data Formatting Details
JSON messages begin with the operation metadata fields, which are followed by the operation data fields. For row data formatting, this are the source column names and source column values as JSON key value pairs. This data is represented by before
and after
members that are objects. These objects contain members whose keys are the column names and whose values are the column values.
Row data is modeled as follows:
-
Inserts: Includes the after-image data.
-
Updates: Includes the after-image data.
-
Deletes: Includes the before-image data.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. The JSON Formatter maps these column value states into the created JSON objects as follows:
-
The column has a value: The column value is output. In the following example, the member
STATE
has a value."CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":"CO" }
-
The column value is null :The default output value is a JSON NULL. In the following example, the member
STATE
is null."CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":null }
-
The column value is missing: The JSON contains no element for a missing column value. In the following example, the member
STATE
is missing."CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", }
The default setting of the JSON Formatter is to map the data types from the source trail file to the associated JSON data type. JSON supports few data types, so this functionality usually results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping can be configured to treat all data as strings.
Parent topic: Using the JSON Formatter
9.2.35.4.6.4 Sample JSON Messages
The following topics are sample JSON messages created by the JSON Formatter for insert, update, delete, and truncate operations.
- Sample Operation Modeled JSON Messages
- Sample Flattened Operation Modeled JSON Messages
- Sample Row Modeled JSON Messages
- Sample Primary Key Output JSON Message
Parent topic: Using the JSON Formatter
9.2.35.4.6.4.1 Sample Operation Modeled JSON Messages
Insert
{
"table":"QASOURCE.TCUSTORD",
"op_type":"I",
"op_ts":"2015-11-05 18:45:36.000000",
"current_ts":"2016-10-05T10:15:51.267000",
"pos":"00000000000000002928",
"after":{
"CUST_CODE":"WILL",
"ORDER_DATE":"1994-09-30:15:33:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":144,
"PRODUCT_PRICE":17520.00,
"PRODUCT_AMOUNT":3,
"TRANSACTION_ID":100
}
}
Update
{
"table":"QASOURCE.TCUSTORD",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:15:51.310002",
"pos":"00000000000000004300",
"before":{
"CUST_CODE":"BILL",
"ORDER_DATE":"1995-12-31:15:00:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":765,
"PRODUCT_PRICE":15000.00,
"PRODUCT_AMOUNT":3,
"TRANSACTION_ID":100
},
"after":{
"CUST_CODE":"BILL",
"ORDER_DATE":"1995-12-31:15:00:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":765,
"PRODUCT_PRICE":14000.00
}
}
Delete
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:15:51.312000",
"pos":"00000000000000005272",
"before":{
"CUST_CODE":"DAVE",
"ORDER_DATE":"1993-11-03:07:51:35",
"PRODUCT_CODE":"PLANE",
"ORDER_ID":600,
"PRODUCT_PRICE":135000.00,
"PRODUCT_AMOUNT":2,
"TRANSACTION_ID":200
}
}
Truncate
{
"table":"QASOURCE.TCUSTORD",
"op_type":"T",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:15:51.312001",
"pos":"00000000000000005480",
}
Parent topic: Sample JSON Messages
9.2.35.4.6.4.2 Sample Flattened Operation Modeled JSON Messages
Insert
{
"table":"QASOURCE.TCUSTORD",
"op_type":"I",
"op_ts":"2015-11-05 18:45:36.000000",
"current_ts":"2016-10-05T10:34:47.956000",
"pos":"00000000000000002928",
"after.CUST_CODE":"WILL",
"after.ORDER_DATE":"1994-09-30:15:33:00",
"after.PRODUCT_CODE":"CAR",
"after.ORDER_ID":144,
"after.PRODUCT_PRICE":17520.00,
"after.PRODUCT_AMOUNT":3,
"after.TRANSACTION_ID":100
}
Update
{
"table":"QASOURCE.TCUSTORD",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:34:48.192000",
"pos":"00000000000000004300",
"before.CUST_CODE":"BILL",
"before.ORDER_DATE":"1995-12-31:15:00:00",
"before.PRODUCT_CODE":"CAR",
"before.ORDER_ID":765,
"before.PRODUCT_PRICE":15000.00,
"before.PRODUCT_AMOUNT":3,
"before.TRANSACTION_ID":100,
"after.CUST_CODE":"BILL",
"after.ORDER_DATE":"1995-12-31:15:00:00",
"after.PRODUCT_CODE":"CAR",
"after.ORDER_ID":765,
"after.PRODUCT_PRICE":14000.00
}
Delete
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:34:48.193000",
"pos":"00000000000000005272",
"before.CUST_CODE":"DAVE",
"before.ORDER_DATE":"1993-11-03:07:51:35",
"before.PRODUCT_CODE":"PLANE",
"before.ORDER_ID":600,
"before.PRODUCT_PRICE":135000.00,
"before.PRODUCT_AMOUNT":2,
"before.TRANSACTION_ID":200
}
Truncate
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:34:48.193001",
"pos":"00000000000000005480",
"before.CUST_CODE":"JANE",
"before.ORDER_DATE":"1995-11-11:13:52:00",
"before.PRODUCT_CODE":"PLANE",
"before.ORDER_ID":256,
"before.PRODUCT_PRICE":133300.00,
"before.PRODUCT_AMOUNT":1,
"before.TRANSACTION_ID":100
}
Parent topic: Sample JSON Messages
9.2.35.4.6.4.3 Sample Row Modeled JSON Messages
Insert
{
"table":"QASOURCE.TCUSTORD",
"op_type":"I",
"op_ts":"2015-11-05 18:45:36.000000",
"current_ts":"2016-10-05T11:10:42.294000",
"pos":"00000000000000002928",
"CUST_CODE":"WILL",
"ORDER_DATE":"1994-09-30:15:33:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":144,
"PRODUCT_PRICE":17520.00,
"PRODUCT_AMOUNT":3,
"TRANSACTION_ID":100
}
Update
{
"table":"QASOURCE.TCUSTORD",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T11:10:42.350005",
"pos":"00000000000000004300",
"CUST_CODE":"BILL",
"ORDER_DATE":"1995-12-31:15:00:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":765,
"PRODUCT_PRICE":14000.00
}
Delete
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T11:10:42.351002",
"pos":"00000000000000005272",
"CUST_CODE":"DAVE",
"ORDER_DATE":"1993-11-03:07:51:35",
"PRODUCT_CODE":"PLANE",
"ORDER_ID":600,
"PRODUCT_PRICE":135000.00,
"PRODUCT_AMOUNT":2,
"TRANSACTION_ID":200
}
Truncate
{
"table":"QASOURCE.TCUSTORD",
"op_type":"T",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T11:10:42.351003",
"pos":"00000000000000005480",
}
Parent topic: Sample JSON Messages
9.2.35.4.6.4.4 Sample Primary Key Output JSON Message
{ "table":"DDL_OGGSRC.TCUSTMER", "op_type":"I", "op_ts":"2015-10-26 03:00:06.000000", "current_ts":"2016-04-05T08:59:23.001000", "pos":"00000000000000006605", "primary_keys":[ "CUST_CODE" ], "after":{ "CUST_CODE":"WILL", "NAME":"BG SOFTWARE CO.", "CITY":"SEATTLE", "STATE":"WA" } }
Parent topic: Sample JSON Messages
9.2.35.4.6.5 JSON Schemas
By default, JSON schemas are generated for each source table encountered. JSON
schemas are generated on a just in time basis when an operation for that table is
first encountered. Newer schemas are generated when there is a change in the
metadata. A JSON schema is not required to parse a JSON object. However, many JSON
parsers can use a JSON schema to perform a validating parse of a JSON object.
Alternatively, you can review the JSON schemas to understand the layout of output
JSON objects. By default, the JSON schemas are created in the
GoldenGate_Home
/dirdef
directory
and are named by the following convention:
FULLY_QUALIFIED_TABLE_NAME
.schema.json
The generation of the JSON schemas is suppressible.
- The following JSON schema example is for the JSON object listed in Sample Operation Modeled JSON Messages.
{ "$schema":"http://json-schema.org/draft-04/schema#", "title":"QASOURCE.TCUSTORD", "description":"JSON schema for table QASOURCE.TCUSTORD", "definitions":{ "row":{ "type":"object", "properties":{ "CUST_CODE":{ "type":[ "string", "null" ] }, "ORDER_DATE":{ "type":[ "string", "null" ] }, "PRODUCT_CODE":{ "type":[ "string", "null" ] }, "ORDER_ID":{ "type":[ "number", "null" ] }, "PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "TRANSACTION_ID":{ "type":[ "number", "null" ] } }, "additionalProperties":false }, "tokens":{ "type":"object", "description":"Token keys and values are free form key value pairs.", "properties":{ }, "additionalProperties":true } }, "type":"object", "properties":{ "table":{ "description":"The fully qualified table name", "type":"string" }, "op_type":{ "description":"The operation type", "type":"string" }, "op_ts":{ "description":"The operation timestamp", "type":"string" }, "current_ts":{ "description":"The current processing timestamp", "type":"string" }, "pos":{ "description":"The position of the operation in the data source", "type":"string" }, "primary_keys":{ "description":"Array of the primary key column names.", "type":"array", "items":{ "type":"string" }, "minItems":0, "uniqueItems":true }, "tokens":{ "$ref":"#/definitions/tokens" }, "before":{ "$ref":"#/definitions/row" }, "after":{ "$ref":"#/definitions/row" } }, "required":[ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties":false }
- The following JSON schema example is for the JSON object listed in Sample Flattened Operation Modeled JSON Messages.
{ "$schema":"http://json-schema.org/draft-04/schema#", "title":"QASOURCE.TCUSTORD", "description":"JSON schema for table QASOURCE.TCUSTORD", "definitions":{ "tokens":{ "type":"object", "description":"Token keys and values are free form key value pairs.", "properties":{ }, "additionalProperties":true } }, "type":"object", "properties":{ "table":{ "description":"The fully qualified table name", "type":"string" }, "op_type":{ "description":"The operation type", "type":"string" }, "op_ts":{ "description":"The operation timestamp", "type":"string" }, "current_ts":{ "description":"The current processing timestamp", "type":"string" }, "pos":{ "description":"The position of the operation in the data source", "type":"string" }, "primary_keys":{ "description":"Array of the primary key column names.", "type":"array", "items":{ "type":"string" }, "minItems":0, "uniqueItems":true }, "tokens":{ "$ref":"#/definitions/tokens" }, "before.CUST_CODE":{ "type":[ "string", "null" ] }, "before.ORDER_DATE":{ "type":[ "string", "null" ] }, "before.PRODUCT_CODE":{ "type":[ "string", "null" ] }, "before.ORDER_ID":{ "type":[ "number", "null" ] }, "before.PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "before.PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "before.TRANSACTION_ID":{ "type":[ "number", "null" ] }, "after.CUST_CODE":{ "type":[ "string", "null" ] }, "after.ORDER_DATE":{ "type":[ "string", "null" ] }, "after.PRODUCT_CODE":{ "type":[ "string", "null" ] }, "after.ORDER_ID":{ "type":[ "number", "null" ] }, "after.PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "after.PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "after.TRANSACTION_ID":{ "type":[ "number", "null" ] } }, "required":[ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties":false }
- The following JSON schema example is for the JSON object listed in Sample Row Modeled JSON Messages.
{ "$schema":"http://json-schema.org/draft-04/schema#", "title":"QASOURCE.TCUSTORD", "description":"JSON schema for table QASOURCE.TCUSTORD", "definitions":{ "tokens":{ "type":"object", "description":"Token keys and values are free form key value pairs.", "properties":{ }, "additionalProperties":true } }, "type":"object", "properties":{ "table":{ "description":"The fully qualified table name", "type":"string" }, "op_type":{ "description":"The operation type", "type":"string" }, "op_ts":{ "description":"The operation timestamp", "type":"string" }, "current_ts":{ "description":"The current processing timestamp", "type":"string" }, "pos":{ "description":"The position of the operation in the data source", "type":"string" }, "primary_keys":{ "description":"Array of the primary key column names.", "type":"array", "items":{ "type":"string" }, "minItems":0, "uniqueItems":true }, "tokens":{ "$ref":"#/definitions/tokens" }, "CUST_CODE":{ "type":[ "string", "null" ] }, "ORDER_DATE":{ "type":[ "string", "null" ] }, "PRODUCT_CODE":{ "type":[ "string", "null" ] }, "ORDER_ID":{ "type":[ "number", "null" ] }, "PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "TRANSACTION_ID":{ "type":[ "number", "null" ] } }, "required":[ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties":false }
Parent topic: Using the JSON Formatter
9.2.35.4.6.6 JSON Formatter Configuration Properties
Table 9-54 JSON Formatter Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
|
None |
Controls whether the generated JSON output messages are
operation modeled or row modeled. Set to |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
|
|
Controls the output format of the JSON data. True formats the data with white space for easy reading. False generates more compact output that is difficult to read.. |
|
Optional |
Any string |
|
Inserts a delimiter between generated JSONs so that they
can be more easily parsed in a continuous stream of data.
Configuration value supports |
|
Optional |
|
|
Controls the generation of JSON schemas for the generated JSON documents. JSON schemas are generated on a table-by-table basis. A JSON schema is not required to parse a JSON document. However, a JSON schemahelp indicate what the JSON documents look like and can be used for a validating JSON parse. |
|
Optional |
Any legal, existing file system path |
|
Controls the output location of generated JSON schemas. |
|
Optional |
|
|
Controls the output typing of generated JSON documents.
When |
|
Optional |
Any legal encoding name or alias supported by Java. |
|
Controls the output encoding of generated JSON schemas and documents. |
|
Optional |
|
|
Controls the version of created schemas. Schema
versioning creates a schema with a timestamp in the schema directory
on the local file system every time a new schema is created.
|
|
Optional |
|
|
Controls the format of the current timestamp. The default
is the ISO 8601 format. A setting of |
|
Optional |
|
|
Controls sending flattened JSON formatted data to the
target entity. Must be set to This property is applicable only to Operation Formatted
JSON ( |
|
Optional |
Any legal character or character string for a JSON field name. |
|
Controls the delimiter for concatenated JSON element
names. This property supports |
|
Optional |
Any legal character or character string for a JSON field name. |
Any legal JSON attribute name. |
Allows you to set whether the JSON element-before, that contains the change column values, can be renamed. This property is only applicable to Operation Formatted
JSON ( |
|
Optional |
Any legal character or character string for a JSON field name. |
Any legal JSON attribute name. |
Allows you to set whether the JSON element, that contains the after-change column values, can be renamed. This property is only applicable to Operation Formatted
JSON ( |
|
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations can be problematic for the JSON formatter and you need to speacially consider it. You can only use this property in conjunction with the row modeled JSON output messages. This property is only applicable to Row Formatted JSON
(
|
gg.handler.name.format.omitNullValues |
Optional |
true | false |
|
Set to |
gg.handler.name.format.omitNullValuesSpecialUpdateHandling |
Optional | true | false |
false |
Only applicable if
gg.handler.name.format.omitNullValues=true . When
set to true , it provides special handling to propagate
the null value on the update after image if the before image data is
missing or has a value.
|
gg.handler.name.format.enableJsonArrayOutput |
Optional | true | false |
false |
Set to true to nest JSON documents
representing the operation data into a JSON array. This works for file
output and Kafka messages in transaction mode.
|
gg.handler.name.format.metaColumnsTemplate |
Optional | See Metacolumn Keywords | None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey |
truncateOpKey | includeTableName | includeOpTimestamp |
includeOpType | includePosition | includeCurrentTimestamp,
useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. For more information about the Metacolumn keywords, see Metacolumn Keywords. This is an example that would
produce a list of metacolumns: |
Parent topic: Using the JSON Formatter
9.2.35.4.6.7 Review a Sample Configuration
The following is a sample configuration for the JSON Formatter in the Java Adapter configuration file:
gg.handler.hdfs.format=json gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.prettyPrint=false gg.handler.hdfs.format.jsonDelimiter=CDATA[] gg.handler.hdfs.format.generateSchema=true gg.handler.hdfs.format.schemaDirectory=dirdef gg.handler.hdfs.format.treatAllColumnsAsStrings=false
Parent topic: Using the JSON Formatter
9.2.35.4.6.8 Metadata Change Events
Metadata change events are handled at runtime. When metadata is changed in a table, the JSON schema is regenerated the next time an operation for the table is encountered. The content of created JSON messages changes to reflect the metadata change. For example, if an additional column is added, the new column is included in created JSON messages after the metadata change event.
Parent topic: Using the JSON Formatter
9.2.35.4.6.9 JSON Primary Key Updates
When the JSON formatter is configured to model operation data, primary key updates require no special treatment and are treated like any other update. The before and after values reflect the change in the primary key.
When the JSON formatter is configured to model row data, primary key updates must be specially handled. The default behavior is to abend. However, by using thegg.handler.name.format.pkUpdateHandling
configuration property, you can configure the JSON formatter to model row data to treat primary key updates as either a regular update or as delete and then insert operations. When you configure the formatter to handle primary key updates as delete and insert operations, Oracle recommends that you configure your replication stream to contain the complete before-image and after-image data for updates. Otherwise, the generated insert operation for a primary key update will be missing data for fields that did not change.
Parent topic: Using the JSON Formatter
9.2.35.4.6.10 Integrating Oracle Stream Analytics
You can integrate Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) with Oracle Stream Analytics (OSA) by sending operation-modeled JSON messages to the Kafka Handler. This works only when the JSON formatter is configured to output operation-modeled JSON messages.
Because OSA requires flattened JSON objects, a new feature in the JSON formatter generates flattened JSONs. To use this feature, set the gg.handler.name.format.flatten=false
to true
. (The default setting is false). The following is an example of a flattened JSON file:
{
"table":"QASOURCE.TCUSTMER",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-06-22T13:38:45.335001",
"pos":"00000000000000005100",
"before.CUST_CODE":"ANN",
"before.NAME":"ANN'S BOATS",
"before.CITY":"SEATTLE",
"before.STATE":"WA",
"after.CUST_CODE":"ANN",
"after.CITY":"NEW YORK",
"after.STATE":"NY"
}
Parent topic: Using the JSON Formatter
9.2.35.4.6.11 Mongo Document Formatting Details
- Column 0 as "_id", which identifies a document in a collection.
- Column 1 as "payload", which holds all the columns (fields of a collection).
JSON Mongo Document Formatter formats the MongoDB Capture processed documents into a JSON format with only payload information.
Example
The document from trail received is
{"after":{"id":"{ \"_id\" :
{ \"$oid\" : \"65b9f02b80f1c27eb4b498e1\" }
}", "payload":"{\"_id\":
{\"$oid\": \"65b9f02b80f1c27eb4b498e1\"}
, \"CUST_CODE\":
\"test2\", \"name\": \"hello world\", \"cost\": {\"$numberDouble\": \"3000.0\"}}"}}
Will
be written
as:{"data":"{\"_id\":
{\"$oid\": \"65b9f02b80f1c27eb4b498e1\"}
, \"CUST_CODE\": \"test2\", \"name\": \"hello world\", \"cost\": {\"$numberDouble\": \"3000.0\"}}"}
where id
field is removed and column name
payload is removed.
JSON MongoDocument
Formatter can be configured to write the data either
in JSON EXTENDED
format or
JSON RELAXED
format with payload
value.
Required Dependencies
Oracle GoldenGate requires that you use the 4.11.1 bson library with JSON Mongo Document Formatter. You can download this driver from: https://mvnrepository.com/artifact/org.mongodb/bson/4.11.1
Maven artifacts for bson-4.11.1 as follows:
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>bson</artifactId>
<version>4.11.1</version>
</dependency>
You must include the path of the bson library in the gg.classpath property.
Example:
gg.classpath=./bson-4.11.1.jar
JSON MongoDocument Formatter Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
gg.handler.name.format | Optional | mongodocument | None | Formats the MongoDB Capture processed documents into a JSON format with only payload information |
gg.handler.name.format.jsonMode | Optional | RELAXED/EXTENDED | RELAXED | MongoDB Document will be represented either in Extended or Relaxed format. |
gg.handler.name.format.insertOpKey | Optional | Any string | I | Indicator to be inserted into the output record to indicate an insert operation. |
gg.handler.name.format.updateOpKey | Optional | Any string | U | Indicator to be inserted into the output record to indicate an update operation. |
gg.handler.name.format.deleteOpKey | Optional | Any string | D | Indicator to be inserted into the output record to indicate a delete operation. |
gg.handler.name.format.truncateOpKey | Optional | Any string | T | Indicator to be inserted into the output record to indicate a truncate operation. |
gg.handler.name.format.metaColumnsTemplate | Optional | See Metacolumn Keywords | None | The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey | truncateOpKey | includeTableName | includeOpTimestamp | includeOpType | includePosition | includeCurrentTimestamp, useIso8601FormatIt is a comma-delimited string consisting of one or more templated values that represent the template. For more information about the Metacolumn keywords, see Metacolumn Keywords. This is an example that would produce a list of metacolumns: ${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp} |
gg.handler.name.format.encoding | Optional | Any legal encoding name or alias supported by Java. | UTF-8 (the JSON default) | Controls the output encoding of generated JSON schemas and documents. |
The following is a sample configuration for the JSON Mongo Document Formatter in the Java Adapter configuration file:
gg.handler.kafka.format=mongodocument gg.handler.kafka.format.insertOpKey=I gg.handler.kafka.format.updateOpKey=U gg.handler.kafka.format.deleteOpKey=D gg.handler.kafka.format.truncateOpKey=T gg.handler.kafka.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro},${timestamp}
Parent topic: Using the JSON Formatter
9.2.35.4.7 Using the Length Delimited Value Formatter
The Length Delimited Value (LDV) Formatter is a row-based formatter. It formats database operations from the source trail file into a length delimited value output. Each insert, update, delete, or truncate operation from the source trail is formatted into an individual length delimited message.
With the length delimited, there are no field delimiters. The fields are variable in size based on the data.
By default, the length delimited maps these column value states into the length delimited value output. Column values for an operation from the source trail file can have one of three states:
-
Column has a value —The column value is output with the prefix indicator
P
. -
Column value is NULL —The default output value is
N
. The output for the case of aNULL
column value is configurable. -
Column value is missing - The default output value is
M
. The output for the case of a missing column value is configurable.
- Formatting Message Details
- Sample Formatted Messages
- LDV Formatter Configuration Properties
- Additional Considerations
Parent topic: Pluggable Formatters
9.2.35.4.7.1 Formatting Message Details
The default format for output of data is the following:
- First is the row Length followed by metadata:
<ROW LENGTH><PRESENT INDICATOR><FIELD LENGTH><OPERATION TYPE><PRESENT INDICATOR><FIELD LENGTH><FULLY QUALIFIED TABLE NAME><PRESENT INDICATOR><FIELD LENGTH><OPERATION TIMESTAMP><PRESENT INDICATOR><FIELD LENGTH><CURRENT TIMESTAMP><PRESENT INDICATOR><FIELD LENGTH><TRAIL POSITION><PRESENT INDICATOR><FIELD LENGTH><TOKENS>
Or
<ROW LENGTH><FIELD LENGTH><FULLY QUALIFIED TABLE NAME><FIELD LENGTH><OPERATION TIMESTAMP><FIELD LENGTH><CURRENT TIMESTAMP><FIELD LENGTH><TRAIL POSITION><FIELD LENGTH><TOKENS>
- Next is the row data:
<PRESENT INDICATOR><FIELD LENGTH><COLUMN 1 VALUE><PRESENT INDICATOR><FIELD LENGTH><COLUMN N VALUE>
Parent topic: Using the Length Delimited Value Formatter
9.2.35.4.7.2 Sample Formatted Messages
- Insert Message:
0133P01IP161446749136000000P161529311765024000P262015-11-05 18:45:36.000000P04WILLP191994-09-30 15:33:00P03CARP03144P0817520.00P013P03100
- Update Message
0133P01UP161446749139000000P161529311765035000P262015-11-05 18:45:39.000000P04BILLP191995-12-31 15:00:00P03CARP03765P0814000.00P013P03100
- Delete Message
0136P01DP161446749139000000P161529311765038000P262015-11-05 18:45:39.000000P04DAVEP191993-11-03 07:51:35P05PLANEP03600P09135000.00P012P03200
Parent topic: Using the Length Delimited Value Formatter
9.2.35.4.7.3 LDV Formatter Configuration Properties
Table 9-55 LDV Formatter Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.binaryLengthMode |
Optional |
|
|
The output can be controlled to display the field or
record length in either binary or ASCII format. If set to
|
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Use to configure the |
|
Optional |
Any string |
|
Use to configure what is included in the output when a column value is present. This value supports CDATA[] wrapping. |
|
Optional |
Any string |
|
Use to configure what is included in the output when a missing value is present. This value supports CDATA[] wrapping. |
|
Optional |
Any string |
|
Use to configure what is included in the output when a NULL value is present. This value supports CDATA[] wrapping. |
|
Optional |
See Metacolumn Keywords. |
None |
Use to configure the
current meta column information in a simple manner and removes
the explicit need of A comma-delimited string consisting of one or more templated values represents the template. This example produces a list of meta columns: ${optype},
${token.ROWID},${sys.username},${currenttimestamp} See Metacolumn Keywords. |
|
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations can be problematic for the text formatter and require special consideration by you.
|
|
Optional |
Any encoding name or alias supported by Java. |
The native system encoding of the machine hosting the Oracle GoldenGate process. |
Use to set the output encoding for character data and columns. |
${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp}
Review a Sample Configuration
#The LDV Handler gg.handler.filewriter.format=binary gg.handler.filewriter.format.binaryLengthMode=false gg.handler.filewriter.format.recordLength=4 gg.handler.filewriter.format.fieldLength=2 gg.handler.filewriter.format.legacyFormat=false gg.handler.filewriter.format.presentValue=CDATA[P] gg.handler.filewriter.format.missingValue=CDATA[M] gg.handler.filewriter.format.nullValue=CDATA[N] gg.handler.filewriter.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro},${timestamp} gg.handler.filewriter.format.pkUpdateHandling=abend
Parent topic: Using the Length Delimited Value Formatter
9.2.35.4.7.4 Additional Considerations
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) differs from RDBMSs in how data is stored. Update and delete operations in an RDBMS result in a change to the existing data. Data is not changed in GG for DAA, it is simply appended to existing data. The current state of a given row becomes a consolidation of all of the existing operations for that row in the HDFS system.
Primary Key Updates
Primary key update operations require special consideration and planning for GG for DAA integrations. Primary key updates are update operations that modify one or more of the primary keys for the given row from the source database. Since data is simply appended in GG for DAA, a primary key update operation looks more like a new insert than an update without any special handling. The Length Delimited Value Formatter provides specialized handling for primary keys that is configurable to you. These are the configurable behaviors:Table 9-56 Primary Key Update Behaviors
Value | Description |
---|---|
Abend |
The default behavior is that the length delimited value formatter will abend in the case of a primary key update. |
Update |
With this configuration the primary key update will be treated just like any other update operation. This configuration alternative should only be selected if you can guarantee that the primary key that is being changed is not being used as the selection criteria when selecting row data from a GG for DAA system. |
Delete-Insert |
Using this configuration the primary key update is treated as a special case of a delete using the before image data and an insert using the after image data. This configuration may more accurately model the effect of a primary key update in a GG for DAA. However, if this configuration is selected it is important to have full supplemental logging enabled on replication at the source database. Without full supplemental logging, the delete operation will be correct, but the insert operation do not contain all of the data for all of the columns for a full representation of the row data in GG for dAA |
Consolidating Data
GG for DAA simply append data to the underlying storage. Typically, analytic tools spawn map reduce programs that traverse the data files and consolidate all the operations for a given row into a single output. It is important to have an indicator of the order of operations. The Length Delimited Value Formatter provides a number of metadata fields to fulfill this need. The operation timestamp may be sufficient to fulfill this requirement. However, two update operations may have the same operation timestamp especially if they share a common transaction. The trail position can provide a tie breaking field on the operation timestamp. Lastly, the current timestamp may provide the best indicator of order of operations in GG for DAA.
Parent topic: Using the Length Delimited Value Formatter
9.2.35.4.8 Using the XML Formatter
The XML Formatter formats before-image and after-image data from the source trail file into an XML document representation of the operation data. The format of the XML document is effectively the same as the XML format in the previous releases of the Oracle GoldenGate Java Adapter.
- Message Formatting Details
- Sample XML Messages
- XML Schema
- XML Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Primary Key Updates
Parent topic: Pluggable Formatters
9.2.35.4.8.1 Message Formatting Details
The XML formatted messages contain the following information:
Table 9-57 XML formatting details
Value | Description |
---|---|
|
The fully qualified table name. |
|
The operation type. |
|
The current timestamp is the time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes micro second precision. Replaying the trail file does not result in the same timestamp for the same operation. |
|
The position from the source trail file. |
|
The total number of columns in the source table. |
|
The |
|
The |
Parent topic: Using the XML Formatter
9.2.35.4.8.2 Sample XML Messages
The following sections provide sample XML messages.
Parent topic: Using the XML Formatter
9.2.35.4.8.2.1 Sample Insert Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='I' ts='2013-06-02 22:14:36.000000' current_ts='2015-10-06T12:21:50.100001' pos='00000000000000001444' numCols='7'> <col name='CUST_CODE' index='0'> <before missing='true'/> <after><![CDATA[WILL]]></after> </col> <col name='ORDER_DATE' index='1'> <before missing='true'/> <after><![CDATA[1994-09-30:15:33:00]]></after> </col> <col name='PRODUCT_CODE' index='2'> <before missing='true'/> <after><![CDATA[CAR]]></after> </col> <col name='ORDER_ID' index='3'> <before missing='true'/> <after><![CDATA[144]]></after> </col> <col name='PRODUCT_PRICE' index='4'> <before missing='true'/> <after><![CDATA[17520.00]]></after> </col> <col name='PRODUCT_AMOUNT' index='5'> <before missing='true'/> <after><![CDATA[3]]></after> </col> <col name='TRANSACTION_ID' index='6'> <before missing='true'/> <after><![CDATA[100]]></after> </col> <tokens> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqL2AAA]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
9.2.35.4.8.2.2 Sample Update Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='U' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.413000' pos='00000000000000002891' numCols='7'> <col name='CUST_CODE' index='0'> <before><![CDATA[BILL]]></before> <after><![CDATA[BILL]]></after> </col> <col name='ORDER_DATE' index='1'> <before><![CDATA[1995-12-31:15:00:00]]></before> <after><![CDATA[1995-12-31:15:00:00]]></after> </col> <col name='PRODUCT_CODE' index='2'> <before><![CDATA[CAR]]></before> <after><![CDATA[CAR]]></after> </col> <col name='ORDER_ID' index='3'> <before><![CDATA[765]]></before> <after><![CDATA[765]]></after> </col> <col name='PRODUCT_PRICE' index='4'> <before><![CDATA[15000.00]]></before> <after><![CDATA[14000.00]]></after> </col> <col name='PRODUCT_AMOUNT' index='5'> <before><![CDATA[3]]></before> <after><![CDATA[3]]></after> </col> <col name='TRANSACTION_ID' index='6'> <before><![CDATA[100]]></before> <after><![CDATA[100]]></after> </col> <tokens> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqLzAAA]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
9.2.35.4.8.2.3 Sample Delete Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='D' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.415000' pos='00000000000000004338' numCols='7'> <col name='CUST_CODE' index='0'> <before><![CDATA[DAVE]]></before> <after missing='true'/> </col> <col name='ORDER_DATE' index='1'> <before><![CDATA[1993-11-03:07:51:35]]></before> <after missing='true'/> </col> <col name='PRODUCT_CODE' index='2'> <before><![CDATA[PLANE]]></before> <after missing='true'/> </col> <col name='ORDER_ID' index='3'> <before><![CDATA[600]]></before> <after missing='true'/> </col> <col name='PRODUCT_PRICE' index='4'> <missing/> </col> <col name='PRODUCT_AMOUNT' index='5'> <missing/> </col> <col name='TRANSACTION_ID' index='6'> <missing/> </col> <tokens> <token> <Name><![CDATA[L]]></Name> <Value><![CDATA[206080450]]></Value> </token> <token> <Name><![CDATA[6]]></Name> <Value><![CDATA[9.0.80330]]></Value> </token> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqLzAAC]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
9.2.35.4.8.2.4 Sample Truncate Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='T' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.415001' pos='00000000000000004515' numCols='7'> <col name='CUST_CODE' index='0'> <missing/> </col> <col name='ORDER_DATE' index='1'> <missing/> </col> <col name='PRODUCT_CODE' index='2'> <missing/> </col> <col name='ORDER_ID' index='3'> <missing/> </col> <col name='PRODUCT_PRICE' index='4'> <missing/> </col> <col name='PRODUCT_AMOUNT' index='5'> <missing/> </col> <col name='TRANSACTION_ID' index='6'> <missing/> </col> <tokens> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqL2AAB]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
9.2.35.4.8.3 XML Schema
The XML Formatter does not generate an XML schema (XSD). The XSD applies to all messages generated by the XML Formatter. The following XSD defines the structure of the XML documents that are generated by the XML Formatter.
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="operation"> <xs:complexType> <xs:sequence> <xs:element name="col" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="before" minOccurs="0"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="missing" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="after" minOccurs="0"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="missing" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element type="xs:string" name="missing" minOccurs="0"/> </xs:sequence> <xs:attribute type="xs:string" name="name"/> <xs:attribute type="xs:short" name="index"/> </xs:complexType> </xs:element> <xs:element name="tokens" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="token" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element type="xs:string" name="Name"/> <xs:element type="xs:string" name="Value"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute type="xs:string" name="table"/> <xs:attribute type="xs:string" name="type"/> <xs:attribute type="xs:string" name="ts"/> <xs:attribute type="xs:dateTime" name="current_ts"/> <xs:attribute type="xs:long" name="pos"/> <xs:attribute type="xs:short" name="numCols"/> </xs:complexType> </xs:element> </xs:schema>
Parent topic: Using the XML Formatter
9.2.35.4.8.4 XML Formatter Configuration Properties
Table 9-58 XML Formatter Configuration Properties
Properties | Optional Y/N | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java. |
UTF-8 (the XML default) |
The output encoding of generated XML documents. |
|
Optional |
|
|
Determines whether an XML prolog is included in generated XML documents. An XML prolog is optional for well-formed XML. An XML prolog resembles the following: |
|
Optional |
|
|
Controls the format of the current timestamp in the XML message. The default adds a |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
gg.handler.name.format.metaColumnsTemplate |
Optional | See Metacolumn Keywords. | None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey |
truncateOpKey | includeTableName | includeOpTimestamp |
includeOpType | includePosition | includeCurrentTimestamp,
useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. For more information about the Metacolumn keywords, see Metacolumn Keywords. |
Parent topic: Using the XML Formatter
9.2.35.4.8.5 Review a Sample Configuration
The following is a sample configuration for the XML Formatter in the Java Adapter properties file:
gg.handler.hdfs.format=xml gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.encoding=ISO-8859-1 gg.handler.hdfs.format.includeProlog=false
Parent topic: Using the XML Formatter
9.2.35.4.8.6 Metadata Change Events
The XML Formatter seamlessly handles metadata change events. A metadata change event does not result in a change to the XML schema. The XML schema is designed to be generic so that the same schema represents the data of any operation from any table.
If the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events, the XML Formatter can take action when metadata changes. Changes in the metadata are reflected in messages after the change. For example, when a column is added, the new column data appears in XML messages for the table.
Parent topic: Using the XML Formatter
9.2.35.4.8.7 Primary Key Updates
Updates to a primary key require no special handling by the XML formatter. The XML formatter creates messages that model database operations. For update operations, this includes before and after images of column values. Primary key changes are represented in this format as a change to a column value just like a change to any other column value.
Parent topic: Using the XML Formatter
9.2.35.5 Stage and Merge Data Warehouse Replication
Data warehouse targets typically support Massively Parallel Processing (MPP). The cost of a single Data Manipulation Language (DML) operation is comparable to the cost of execution of batch DMLs.
Therefore, for better throughput the change data from the Oracle GoldenGate trails can be staged in micro batches at a temporary staging location, and the staged data records are merged into the data warehouse target table using the respective data warehouse’s merge SQL statement. This section outlines an approach to replicate change data records from source databases to target data warehouses using stage and merge.
This chapter contains:
- Steps for Stage and Merge
- Hive Stage and Merge
Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability for querying and analysis of large data sets stored in Hadoop files.
Parent topic: Additional Details
9.2.35.5.1 Steps for Stage and Merge
- Stage
In this step the change data records in the Oracle GoldenGate trail files are pushed into a staging location. The staging location is typically a cloud object store such as OCI, AWS S3, Azure Data Lake, or Google Cloud Storage. - Merge
In this step the change data files in the object store are viewed as an external table defined in the data warehouse. The data in the external staging table is merged onto the target table. - Configuration of Handlers
File Writer(FW) handler needs to be configured to generate local staging files that contain change data from the Oracle GoldenGate trail files. - File Writer Handler
File Writer (FW) handler is typically configured to generate files partitioned by table using the configurationgg.handler.{name}.partitionByTable=true
. - Operation Aggregation
Operation aggregation is the process of aggregating (merging/compressing) multiple operations on the same row into a single output operation based on a threshold. - Object Store Event handler
The File Writer handler needs to be chained with an object store Event handler. Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) supports uploading files to most cloud object stores such as OCI, AWS S3, and Azure Data Lake. - JDBC Metadata Provider
If the data warehouse supports JDBC connection, then the JDBC metadata provider needs to be enabled. - Stage and Merge Sample Configuration
A working configuration for the respective data warehouse is available under the directoryAdapterExamples/big-data/data-warehouse-utils/<target>/
. - Variables in the Merge Script
Typically, variables appear at the beginning of the Oracle provided script. There are lines starting with#TODO
: that document the changes required for variables in the script. - SQL Statements in the Merge Script
The SQL statements in the shell script needs to be customized. There are lines starting with#TODO
: that document the changes required for SQL statements. - Merge Script Functions
- Prerequisites
- Limitations
Parent topic: Stage and Merge Data Warehouse Replication
9.2.35.5.1.1 Stage
In this step the change data records in the Oracle GoldenGate trail files are pushed into a staging location. The staging location is typically a cloud object store such as OCI, AWS S3, Azure Data Lake, or Google Cloud Storage.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.2 Merge
In this step the change data files in the object store are viewed as an external table defined in the data warehouse. The data in the external staging table is merged onto the target table.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.3 Configuration of Handlers
File Writer(FW) handler needs to be configured to generate local staging files that contain change data from the Oracle GoldenGate trail files.
The FW handler needs to be chained to an object store Event handler that can upload the staging files into a staging location.
The staging location is typically a cloud object store, such as AWS S3 or Azure Data Lake.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.4 File Writer Handler
File Writer (FW) handler is typically configured to generate files
partitioned by table using the configuration
gg.handler.{name}.partitionByTable=true
.
In most cases FW handler is configured to use the Avro Object Container Format (OCF) formatter.
The output file format could change based on the specific data warehouse target.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.5 Operation Aggregation
Operation aggregation is the process of aggregating (merging/compressing) multiple operations on the same row into a single output operation based on a threshold.
Operation Aggregation needs to be enabled for stage and merge replication using the
configuration gg.aggregate.operations=true
.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.6 Object Store Event handler
The File Writer handler needs to be chained with an object store Event handler. Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) supports uploading files to most cloud object stores such as OCI, AWS S3, and Azure Data Lake.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.7 JDBC Metadata Provider
If the data warehouse supports JDBC connection, then the JDBC metadata provider needs to be enabled.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.8 Stage and Merge Sample Configuration
A working configuration for the respective data warehouse is available
under the directory
AdapterExamples/big-data/data-warehouse-utils/<target>/
.
- replicat parameter (.prm) file.
- replicat properties file that contains the FW handler and all the Event handler configuration.
- DDL file for the sample table used in the merge script.
- Merge script for the specific data warehouse. This script contains SQL statements tested using the sample table defined in the DDL file.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.9 Variables in the Merge Script
Typically, variables appear at the beginning of the Oracle provided
script. There are lines starting with #TODO
: that document the changes
required for variables in the script.
#TODO: Edit this. Provide the replicat group name. repName=RBD #TODO: Edit this. Ensure each replicat uses a unique prefix. stagingTablePrefix=${repName}_STAGE_ #TODO: Edit the AWS S3 bucket name. bucket=<AWS S3 bucket name> #TODO: Edit this variable as needed. s3Location="'s3://${bucket}/${dir}/'" #TODO: Edit AWS credentials awsKeyId and awsSecretKey awsKeyId=<AWS Access Key Id> awsSecretKey=<AWS Secret key>
The variables repName
and stagingTablePrefix
are
relevant for all the data warehouse targets.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.10 SQL Statements in the Merge Script
The SQL statements in the shell script needs to be customized. There are
lines starting with #TODO
: that document the changes required for SQL
statements.
In most cases, we need to double quote " identifiers in the SQL statement. The double
quote needs to be escaped in the script using backslash. For example:
\"
.
Oracle provides a working example of SQL statements for a single table
with a pre-defined set of columns defined in the sample DDL file. You need to add
new sections for your own tables as part of if-else
code block in
the script.
if [ "${tableName}" == "DBO.TCUSTORD" ] then #TODO: Edit all the column names of the staging and target tables. # The merge SQL example here is configured for the example table defined in the DDL file. # Oracle provided SQL statements # TODO: Add similar SQL queries for each table. elif [ "${tableName}" == "DBO.ANOTHER_TABLE" ] then #Edit SQLs for this table. fi
Parent topic: Steps for Stage and Merge
9.2.35.5.1.11 Merge Script Functions
The script is coded to include the following shell functions:
main
validateParams
process
processTruncate
processDML
dropExternalTable
createExternalTable
merge
The script has code comments for you to infer the purpose of each function.
Merge Script main
function
The function main
is the entry point of the script. The
processing of the staged changed data file begin here.
This function invokes two functions: validateParams
and
process
.
The input parameters to the script is validated in the function:
validateParams
.
Processing resumes in the process
function if
validation is successful.
Merge Script process
function
This function processes the operation records in the staged change data file and
invokes processTruncate
or processDML
as needed.
Truncate operation records are handled in the function
processTruncate
. Insert
,
Update
, and Delete
operation records are
handled in the function processDML
.
Merge Script merge
function
The merge
function invoked by the function
processDML
contains the merge SQL statement that will be
executed for each table.
The key columns to be used in the merge SQL’s ON
clause
needs to be customized.
null
values, the
ON
clause uses data warehouse specific NVL
functions. Example for a single key column
"C01Key
":ON ((NVL(CAST(TARGET.\"C01Key\" AS VARCHAR(4000)),'${uuid}')=NVL(CAST(STAGE.\"C01Key\" AS VARCHAR(4000)),'${uuid}')))`
The column names in the merge
statement’s
update
and insert
clauses also needs to be
customized for every table.
Merge Script createExternalTable
function
The createExternalTable
function invoked by the function
processDML
creates an external table that is backed by the file
in the respective object store file.
In this function, the DDL SQL statement for the external table should be customized for every target table to include all the target table columns.
In addition to the target table columns, the external table definition
also consists of three meta-columns: optype
,
position
, and fieldmask
.
The data type of the meta-columns should not be modified. The position of the meta-columns should not be modified in the DDL statement.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.12 Prerequisites
- The Command handler merge scripts are available, starting from Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release 19.1.0.0.8.
- The respective data warehouse’s command line programs to execute SQL queries must be installed on the machine where GG for DAA is installed.
Parent topic: Steps for Stage and Merge
9.2.35.5.1.13 Limitations
Primary key update operations are split into delete and insert pair. In
case the Oracle GoldenGate trail file doesn't contain column values for all the
columns in the respective table, then the missing columns gets updated to
null
on the target table.
Parent topic: Steps for Stage and Merge
9.2.35.5.2 Hive Stage and Merge
Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability for querying and analysis of large data sets stored in Hadoop files.
- Data Flow
- Configuration
The directoryAdapterExamples/big-data/data-warehouse-utils/hive/
in the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) install contains all the configuration and scripts needed needed for replication to Hive using stage and merge. - Merge Script Variables
- Prerequisites
Parent topic: Stage and Merge Data Warehouse Replication
9.2.35.5.2.1 Data Flow
- File Writer (FW) handler is configured to generate files in Avro Object Container Format (OCF).
- The HDFS Event handler is used to push the Avro OCF files into Hadoop.
Parent topic: Hive Stage and Merge
9.2.35.5.2.2 Configuration
The directory
AdapterExamples/big-data/data-warehouse-utils/hive/
in the Oracle
GoldenGate for Distributed Applications and Analytics (GG for DAA) install contains all the
configuration and scripts needed needed for replication to Hive using stage and
merge.
hive.prm
: The replicat parameter file.hive.props
: The replicat properties file that stages data to Hadoop.hive.sh
: The bash-shell script that reads data staged in Hadoop and merges data to Hive target table.hive-ddl.sql
: The DDL statement that contains sample target table used in the scripthive.sh
.
Edit the properties indicated by the #TODO
:
comments in the properties file hive.props
.
The
bash-shell script function merge()
contains SQL statements that
needs to be customized for your target tables.
Parent topic: Hive Stage and Merge
9.2.35.5.2.3 Merge Script Variables
Modify the variables needs as needed:
#TODO: Modify the location of the OGGBD dirdef directory where the Avro schema files exist. avroSchemaDir=/opt/ogg/dirdef #TODO: Edit the JDBC URL to connect to hive. hiveJdbcUrl=jdbc:hive2://localhost:10000/default #TODO: Edit the JDBC user to connect to hive. hiveJdbcUser=APP #TODO: Edit the JDBC password to connect to hive. hiveJdbcPassword=mine #TODO: Edit the replicat group name. repName=HIVE #TODO: Edit this. Ensure each replicat uses a unique prefix. stagingTablePrefix=${repName}_STAGE_
Parent topic: Hive Stage and Merge
9.2.35.5.2.4 Prerequisites
The following are the prerequisites:
- The merge script
hive.sh
requires command line programbeeline
to be installed on the machine where Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) replicat is installed. - The custom script
hive.sh
uses themerge
SQL statement.Hive Query Language (Hive QL) introduced support for
merge
in Hive version 2.2.
Parent topic: Hive Stage and Merge
9.2.35.6 Template Keywords
The templating functionality allows you to use a mix of constants and/or keywords for context based resolution of string values at runtime. The templating functionality is used extensively in the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) to resolve file paths, file names, topic names, or message keys. This appendix describes the keywords and their associated arguments if applicable. Additionally, there are examples showing templates and resolved values.
Template Keywords
This table includes a column if the keyword is supported for transaction level messages.
Keyword | Explanation | Transaction Message Support |
---|---|---|
|
Resolves to the fully qualified table name including the period (.) delimiter between the catalog, schema, and table names. For example, |
No |
|
Resolves to the catalog name. |
No |
|
Resolves to the schema name. |
No |
|
Resolves to the short table name. |
No |
|
Resolves to the type of the operation:
( |
No |
|
The first parameter is optional and allows you to
set the delimiter between primary key values. The default is
_ .
|
No |
|
The sequence number of the source trail file followed by the offset (RBA). |
Yes |
|
The operation timestamp from the source trail file. |
Yes |
|
Resolves to “”. |
Yes |
|
Resolves to the name of the Replicat process. If using coordinated delivery, it resolves to the name of the Replicat process with the Replicate thread number appended. |
Yes |
or
|
Resolves to a static value where the
key is the fully-qualified table name. The keys and values are
designated inside of the square brace in the following format:
|
No |
${xid} |
Resolves the transaction id. | Yes |
or ${columnValue[][][]} |
Resolves to a column value where the key is the fully-qualified table name and the value is the column name to be resolved. For example: ${columnValue[DBO.TABLE1=COL1,DBO.TABLE2=COL2]} The
second parameter is optional and allows you to set the value to
use if the column value is null. The default is an empty string
The third parameter is optional
and allows you to set the value to use if the column value is
missing. The default is an empty string If the ${columnValue[COL1]} or ${columnValue[COL2][NULL][MISSING]} |
No |
Or
|
Resolves to the current timestamp.
You can control the format of the current timestamp using the
Java based formatting as described in the
Examples: ${currentTimestamp}${currentTimestamp[yyyy-MM-dd
HH:mm:ss.SSS]} |
Yes |
|
Resolves to a NULL string. |
Yes |
|
It is possible to write a custom value resolver. If required, contact Oracle Support. |
Implementation dependent |
${token[]} |
Resolves a token value. | No |
${toLowerCase[]} |
Keyword to convert to argument to lower case. Argument can be constants, keywords, or combination of both. | Yes |
${toUpperCase[]} |
Keyword to convert to argument to upper case. Argument can be constants, keywords, or combination of both. | Yes |
${substring[][]}
Or ${substring[][][]} |
Keyword to perform a substring operation on the
configured content.
Note: Performing a substring function means that an array index out of bounds condition can occur at runtime. This occurs if the configured starting index or ending index is beyond the length of the string currently being acted upon. The${substring} function does not throw a
runtime exception. It instead detects an array index out of
bounds condition and in that case does not execute the substring
function.
|
Yes |
${regex[][][]} |
Keyword to apply a regular expressions to search and
replace content. This has three required parameters:
|
Yes |
${operationCount} |
Keyword to resolve the count of operations. | Yes |
${insertCount} |
Keyword to resolve the count of insert operations. | Yes |
${deleteCount} |
Keyword to resolve the count of delete operations. | Yes |
${updateCount} |
Keyword to resolve the count of update operations. | Yes |
${truncateCount} |
Keyword to resolve the count of truncate operations. | Yes |
${uuid} |
Keyword to resolve a universally unique identifier (UUID). This is a 36 character string guaranteed to be unique. An example UUID: 7f6e4529-e387-48c1-a1b6-3e7a4146b211 | Yes |
Example Templates
The following describes example template configuration values and the resolved values.
Example Template | Resolved Value |
---|---|
|
|
|
|
|
|
A_STATIC_VALUE |
A_STATIC_VALUE |
Parent topic: Additional Details
9.2.35.7 Velocity Dependencies
Starting Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release 21.1.0.0.0, the Velocity jar files have been removed from the packaging.
For the Velocity formatting to work, you need to download the jars and
include them in their runtime by modifying the gg.classpath
.
The maven coordinates for Velocity are as follows:
Maven groupId: org.apache.velocity
Maven artifactId: velocity
Version: 1.7
Parent topic: Additional Details