5 Using the File Writer Handler
Learn how to use the File Writer Handler and associated event handlers, which enables you to write data files to a local system.
Topics:
- Overview
 Learn how to use the File Writer Handler and the event handlers to transform data.
- Using the HDFS Event Handler
 Learn how to use the HDFS Event Handler to load files generated by the File Writer Handler into HDFS.
- Using the Optimized Row Columnar Event Handler
 Learn how to use the Optimized Row Columnar (ORC) Event Handler to generate data files in ORC format.
- Using the Oracle Cloud Infrastructure Event Handler
 Learn how to use the Oracle Cloud Infrastructure Event Handler to load files generated by the File Writer Handler into an Oracle Cloud Infrastructure Object Store.
- Using the Oracle Cloud Infrastructure Classic Event Handler
 Learn how to use the Oracle Cloud Infrastructure Classic Event Handler to load files generated by the File Writer Handler into an Oracle Cloud Infrastructure Classic Object Store.
- Using the Parquet Event Handler
 Learn how to use the Parquet Event Handler to load files generated by the File Writer Handler into HDFS.
- Using the S3 Event Handler
 Learn how to use the S3 Event Handler, which provides the interface to Amazon S3 web services.
5.1 Overview
Learn how to use the File Writer Handler and the event handlers to transform data.
The File Writer Handler supports generating data files in delimited text, XML, JSON, Avro, and Avro Object Container File formats. It is intended to fulfill an extraction, load, and transform use case. Data files are staged on your local file system. Then when writing to a data file is complete, you can use a third party application to read the file to perform additional processing.
The File Writer Handler also supports the event handler framework. The event handler framework allows data files generated by the File Writer Handler to be transformed into other formats, such as Optimized Row Columnar (ORC) or Parquet. Data files can be loaded into third party applications, such as HDFS or Amazon S3. The event handler framework is extensible allowing more event handlers performing different transformations or loading to different targets to be developed. Additionally, you can develop a custom event handler for your big data environment.
Oracle GoldenGate for Big Data provides two handlers to write to HDFS. Oracle recommends that you use the HDFS Handler or the File Writer Handler in the following situations:
- The HDFS Event Handler is designed to stream data directly to HDFS.
- 
                           No post write processing is occurring in HDFS. The HDFS Event Handler does not change the contents of the file, it simply uploads the existing file to HDFS. Analytical tools are accessing data written to HDFS in real time including data in files that are open and actively being written to. 
- The File Writer Handler is designed to stage data to the local file system and then to load completed data files to HDFS when writing for a file is complete.
- 
                           Analytic tools are not accessing data written to HDFS in real time. Post write processing is occurring in HDFS to transform, reformat, merge, and move the data to a final location. You want to write data files to HDFS in ORC or Parquet format. 
Topics:
Parent topic: Using the File Writer Handler
5.1.1 Detailing the Functionality
Topics:
- Using File Roll Events
- Automatic Directory Creation
- About the Active Write Suffix
- Maintenance of State
- Using Templated Strings
Parent topic: Overview
5.1.1.1 Using File Roll Events
A file roll event occurs when writing to a specific data file is completed. No more data is written to that specific data file.
Finalize Action Operation
You can configure the finalize action operation to clean up a specific data file after a successful file roll action using the finalizeaction parameter with the following options:
                           
- 
                                    none
- 
                                 Leave the data file in place (removing any active write suffix, see About the Active Write Suffix). 
- 
                                    delete
- 
                                 Delete the data file (such as, if the data file has been converted to another format or loaded to a third party application). 
- 
                                    move
- 
                                 Maintain the file name (removing any active write suffix), but move the file to the directory resolved using the movePathMappingTemplateproperty.
- 
                                    rename
- 
                                 Maintain the current directory, but rename the data file using the fileRenameMappingTemplateproperty.
- 
                                    move-rename
- 
                                 Rename the file using the file name generated by the fileRenameMappingTemplateproperty and move the file the file to the directory resolved using themovePathMappingTemplateproperty.
Typically, event handlers offer a subset of these same actions.
A sample Configuration of a finalize action operation:
gg.handlerlist=filewriter
#The File Writer Handler
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.mode=op
gg.handler.filewriter.pathMappingTemplate=./dirout/evActParamS3R
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.fileNameMappingTemplate=${fullyQualifiedTableName}_${currentTimestamp}.txt
gg.handler.filewriter.fileRollInterval=7m
gg.handler.filewriter.finalizeAction=delete
gg.handler.filewriter.inactivityRollInterval=7mFile Rolling Actions
Any of the following actions trigger a file roll event.
- 
                                 A metadata change event. 
- 
                                 The maximum configured file size is exceeded 
- 
                                 The file roll interval is exceeded (the current time minus the time of first file write is greater than the file roll interval). 
- 
                                 The inactivity roll interval is exceeded (the current time minus the time of last file write is greater than the file roll interval). 
- 
                                 The File Writer Handler is configured to roll on shutdown and the Replicat process is stopped. 
Operation Sequence
The file roll event triggers a sequence of operations to occur. It is important that you understand the order of the operations that occur when an individual data file is rolled:
- 
                                 The active data file is switched to inactive, the data file is flushed, and state data file is flushed. 
- 
                                 The configured event handlers are called in the sequence that you specified. 
- 
                                 The finalize action is executed on all the event handlers in the reverse order in which you configured them. Any finalize action that you configured is executed. 
- 
                                 The finalize action is executed on the data file and the state file. If all actions are successful, the state file is removed. Any finalize action that you configured is executed. 
For example, if you configured the File Writer Handler with the Parquet Event Handler and then the S3 Event Handler, the order for a roll event is:
- 
                                 The active data file is switched to inactive, the data file is flushed, and state data file is flushed. 
- 
                                 The Parquet Event Handler is called to generate a Parquet file from the source data file. 
- 
                                 The S3 Event Handler is called to load the generated Parquet file to S3. 
- 
                                 The finalize action is executed on the S3 Parquet Event Handler. Any finalize action that you configured is executed. 
- 
                                 The finalize action is executed on the Parquet Event Handler. Any finalize action that you configured is executed. 
- 
                                 The finalize action is executed for the data file in the File Writer Handler 
Parent topic: Detailing the Functionality
5.1.1.2 Automatic Directory Creation
Parent topic: Detailing the Functionality
5.1.1.3 About the Active Write Suffix
A common use case is using a third party application to monitor the write directory to read data files. Third party application can only read a data file when writing to that file has completed. These applications need a way to determine if writing to a data file is active or complete. The File Writer Handler allows you to configure an active write suffix using this property:
gg.handler.name.fileWriteActiveSuffix=.tmp
                           The value of this property is appended to the generated file name. When writing to the file is complete, the data file is renamed and the active write suffix is removed from the file name. You can set your third party application to monitor your data file names to identify when the active write suffix is removed.
Parent topic: Detailing the Functionality
5.1.1.4 Maintenance of State
Previously, all Oracle GoldenGate for Big Data Handlers have been stateless. These stateless handlers only maintain state in the context of the Replicat process that it was running. If the Replicat process was stopped and restarted, then all the state was lost. With a Replicat restart, the handler began writing with no contextual knowledge of the previous run.
The File Writer Handler provides the ability of maintaining state between invocations of the Replicat process. By default with a restart:
- 
                                 the state saved files are read, 
- 
                                 the state is restored, 
- 
                                 and appending active data files continues where the previous run stopped. 
You can change this default action to require all files be rolled on shutdown by setting this property:
gg.handler.name.rollOnShutdown=true
Parent topic: Detailing the Functionality
5.1.1.5 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The ORC Event Handler makes extensive use of templated strings to generate the ORC directory names, data file names, and ORC bucket names. These strings give you the flexibility to select where to write data files and the names of those data files. You should exercise caution when choosing file and directory names to avoid file naming collisions that can result in an abend.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Requirements
The directory and file names generated using the templates must be legal on the system being written to. File names must be unique to avoid a file name collision. You can avoid a collision by adding a current timestamp using the ${currentTimestamp} keyword. If you are using coordinated apply, then adding ${groupName} into the data file name is recommended.
                           
Parent topic: Detailing the Functionality
5.1.2 Configuring the File Writer Handler
Lists the configurable values for the File Writer Handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file)
To enable the selection of the File Writer Handler, you must first configure the handler type by specifying gg.handler.jdbc.type=filewriter and the other File Writer properties as follows:
                        
Table 5-1 File Writer Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the File Writer Handler for use. | 
| 
 | Optional | Default unit of measure is bytes. You can stipulate  | 1g | Sets the maximum file size of files generated by the File Writer Handler. When the file size is exceeded, a roll event is triggered. | 
| 
 | Optional | The default unit of measure is milliseconds. You can stipulate  | File rolling on time is off. | The timer starts when a file is created. If the file is still open when the interval elapses then the a file roll event will be triggered. | 
| 
 | Optional | The default unit of measure is milliseconds. You can stipulate  | File inactivity rolling is turned off. | The timer starts from the latest write to a generated file. New writes to a generated file restart the counter. If the file is still open when the timer elapses a roll event is triggered.. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate File Writer Handler data file names at runtime. | None | Use keywords interlaced with constants to dynamically generate a unique path names
                                at runtime. Typically, path names follow the format,
                                         | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the directory to which a file is written. | None | Use keywords interlaced with constants to dynamically generate a unique path names
                                at runtime. Typically, path names follow the format,
                                         | 
| 
 | Optional | A string. | None | An optional suffix that is appended to files generated by the File Writer Handler to indicate that writing to the file is active. At the finalize action the suffix is removed. | 
| 
 | Required | A directory on the local machine to store the state files of the File Writer Handler. | None | Sets the directory on the local machine to store the state files of the File Writer Handler. The group name is appended to the directory to ensure that the functionality works when operating in a coordinated apply environment. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | 
 | 
 | Indicates what the File Writer Handler should do at the finalize action. 
 | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | 
 | No event handler configured. | A unique string identifier cross referencing an event handler. The event handler will be invoked on the file roll event. Event handlers can do thing file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS. | 
| 
 | Required if  | A string with resolvable keywords and constants used to dynamically generate File Writer Handler data file names for file renaming in the finalize action. | None. | Use keywords interlaced with constants to dynamically generate unique file names at
                                runtime. Typically, file names follow the format,
                                         | 
| 
 | Required if  | A string with resolvable keywords and constants used to dynamically generate the directory to which a file is written. | None | Use keywords interlaced with constants to dynamically generate a unique path names at runtime. Typically, path names typically follow the format,  | 
| 
 | Required | 
 | 
 | Selects the formatter for the HDFS Handler for how output data will be formatted 
 If you want to use the Parquet or ORC Event Handlers, then the selected format must be  | 
| 
 | Optional | An even number of hex characters. | None | Enter an even number of hex characters where every two characters correspond to a single byte in the byte order mark (BOM). For example, the string  | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | Any string | new line ( | Allows you to control the delimiter separating file names in the control file. You can use | 
| 
 | Optional | A path to a directory to hold the control file. | A period ( | Set to specify where you want to write the control file. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | One or more times to trigger a roll action of all open files. | None | Configure one or more trigger times in the following format: HH:MM,HH:MM,HH:MM Entries are based on a 24 hour clock. For example, an entry to configure rolled actions at three discrete times of day is: gg.handler.fw.atTime=03:30,21:00,23:51 | 
| 
 | Optional | 
 no compression. | 
 | Enables the corresponding compression algorithm for generated Avro
                                OCF files. The corresponding compression library must be added to
                                the  | 
| 
 | Optional | 
 | Positive Integer >= 512 | Sets the size the  | 
Parent topic: Overview
5.1.3 Review a Sample Configuration
This File Writer Handler configuration example is using the Parquet Event Handler to convert data files to Parquet, and then for the S3 Event Handler to load Parquet files into S3:
gg.handlerlist=filewriter 
#The handler properties 
gg.handler.name.type=filewriter 
gg.handler.name.mode=op 
gg.handler.name.pathMappingTemplate=./dirout 
gg.handler.name.stateFileDirectory=./dirsta 
gg.handler.name.fileNameMappingTemplate=${fullyQualifiedTableName}_${currentTimestamp}.txt 
gg.handler.name.fileRollInterval=7m 
gg.handler.name.finalizeAction=delete 
gg.handler.name.inactivityRollInterval=7m 
gg.handler.name.format=avro_row_ocf 
gg.handler.name.includetokens=true 
gg.handler.name.partitionByTable=true 
gg.handler.name.eventHandler=parquet 
gg.handler.name.rollOnShutdown=true 
gg.eventhandler.parquet.type=parquet 
gg.eventhandler.parquet.pathMappingTemplate=./dirparquet 
gg.eventhandler.parquet.writeToHDFS=false 
gg.eventhandler.parquet.finalizeAction=delete 
gg.eventhandler.parquet.eventHandler=s3 
gg.eventhandler.parquet.fileNameMappingTemplate=${tableName}_${currentTimestamp}.parquet 
gg.handler.filewriter.eventHandler=s3 
gg.eventhandler.s3.type=s3
gg.eventhandler.s3.region=us-west-2 
gg.eventhandler.s3.proxyServer=www-proxy.us.oracle.com 
gg.eventhandler.s3.proxyPort=80 
gg.eventhandler.s3.bucketMappingTemplate=tomsfunbucket 
gg.eventhandler.s3.pathMappingTemplate=thepath 
gg.eventhandler.s3.finalizeAction=none
goldengate.userexit.writers=javawriterParent topic: Overview
5.2 Using the HDFS Event Handler
Learn how to use the HDFS Event Handler to load files generated by the File Writer Handler into HDFS.
See Using the File Writer Handler.
Topics:
Parent topic: Using the File Writer Handler
5.2.1 Detailing the Functionality
Topics:
Parent topic: Using the HDFS Event Handler
5.2.1.1 Configuring the Handler
The HDFS Event Handler can can upload data files to HDFS. These additional configuration steps are required:
The HDFS Event Handler dependencies and considerations are the same as the HDFS Handler, see HDFS Additional Considerations.
Ensure that gg.classpath includes the HDFS client libraries.
                           
Ensure that the directory containing the HDFS core-site.xml file is in gg.classpath. This is so the core-site.xml file can be read at runtime and the connectivity information to HDFS can be resolved. For example: 
                           
gg.classpath=/{HDFSinstallDirectory}/etc/hadoopIf Kerberos authentication is enabled on the HDFS cluster, you have to configure the Kerberos principal and the location of the keytab file so that the password can be resolved at runtime:
                           
gg.eventHandler.name.kerberosPrincipal=principal
gg.eventHandler.name.kerberosKeytabFile=pathToTheKeytabFile
Parent topic: Detailing the Functionality
5.2.1.2 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The HDFS Event Handler makes extensive use of templated strings to generate the HDFS directory names, data file names, and HDFS bucket names. This gives you the flexibility to select where to write data files and the names of those data files.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Parent topic: Detailing the Functionality
5.2.1.3 Configuring the HDFS Event Handler
You configure the HDFS Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the HDFS Event Handler, you must first configure the handler type by specifying gg.eventhandler.jdbc.type=hdfs and the other HDFS Event properties as follows:
                           
Table 5-2 HDFS Event Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the HDFS Event Handler for use. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in HDFS to write data files. | None | Use keywords interlaced with constants to dynamically generate a unique path names at runtime. Path names typically follow the format,  | 
| 
 | Optional | A string with resolvable keywords and constants used to dynamically generate the HDFS file name at runtime. | None | Use keywords interlaced with constants to dynamically generate a unique file names at runtime. If not set, the upstream file name is used. | 
| 
 | Optional | 
 | 
 | Indicates what the File Writer Handler should do at the finalize action. 
 | 
| 
 | Optional | The Kerberos principal name. | None | Set to the Kerberos principal when HDFS Kerberos authentication is enabled. | 
| 
 | Optional | The path to the Keberos  | None | Set to the path to the Kerberos  | 
| 
 | Optional | A unique string identifier cross referencing a child event handler. | No event handler configured. | A unique string identifier cross referencing an event handler. The event handler will be invoked on the file roll event. Event handlers can do thing file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS. | 
Parent topic: Detailing the Functionality
5.3 Using the Optimized Row Columnar Event Handler
Learn how to use the Optimized Row Columnar (ORC) Event Handler to generate data files in ORC format.
Topics:
Parent topic: Using the File Writer Handler
5.3.1 Overview
ORC is a row columnar format that can substantially improve data retrieval times and the performance of Big Data analytics. You can use the ORC Event Handler to write ORC files to either a local file system or directly to HDFS. For information, see https://orc.apache.org/.
Parent topic: Using the Optimized Row Columnar Event Handler
5.3.2 Detailing the Functionality
Topics:
Parent topic: Using the Optimized Row Columnar Event Handler
5.3.2.1 About the Upstream Data Format
The ORC Event Handler can only convert Avro Object Container File (OCF) generated by the File Writer Handler. The ORC Event Handler cannot convert other formats to ORC data files. The format of the File Writer Handler must be avro_row_ocf or avro_op_ocf, see Using the File Writer Handler.
                           
Parent topic: Detailing the Functionality
5.3.2.2 About the Library Dependencies
Generating ORC files requires both the Apache ORC libraries and the HDFS client libraries, see Optimized Row Columnar Event Handler Client Dependencies and HDFS Handler Client Dependencies.
Oracle GoldenGate for Big Data does not include the Apache ORC libraries nor does it include the HDFS client libraries. You must configure the gg.classpath variable to include the dependent libraries.
                           
Parent topic: Detailing the Functionality
5.3.2.3 Requirements
The ORC Event Handler can write ORC files directly to HDFS. You must set the writeToHDFS property to true:
                           
gg.eventhandler.orc.writeToHDFS=true
Ensure that the directory containing the HDFS core-site.xml file is in gg.classpath.  This is so the core-site.xml file can be read at runtime and the connectivity information to HDFS can be resolved. For example: 
                           
gg.classpath=/{HDFS_install_directory}/etc/hadoopIf you enable Kerberos authentication is on the HDFS cluster, you have to configure the Kerberos principal and the location of the keytab file so that the password can be resolved at runtime:
                           
gg.eventHandler.name.kerberosPrincipal=principal
gg.eventHandler.name.kerberosKeytabFile=path_to_the_keytab_file
Parent topic: Detailing the Functionality
5.3.2.4 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The ORC Event Handler makes extensive use of templated strings to generate the ORC directory names, data file names, and ORC bucket names. This gives you the flexibility to select where to write data files and the names of those data files.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Parent topic: Detailing the Functionality
5.3.3 Configuring the ORC Event Handler
You configure the ORC Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
The ORC Event Handler works only in conjunction with the File Writer Handler.
To enable the selection of the ORC Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=orc and the other ORC properties as follows:
                        
Table 5-3 ORC Event Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the ORC Event Handler. | 
| 
 | Optional | 
 | 
 | The ORC framework allows direct writing to HDFS.  Set to  | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the ORC bucket to write the file. | None | Use keywords interlaced with constants to dynamically generate a unique ORC path names at runtime. Typically, path names follow the format,  | 
| 
 | Optional | A string with resolvable keywords and constants used to dynamically generate the ORC file name at runtime. | None | Use resolvable keywords and constants used to dynamically generate the ORC data file name at runtime. If not set, the upstream file name is used. | 
| 
 | Optional | 
 | 
 | Sets the compression codec of the generated ORC file. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | The Kerberos principal name. | None | Sets the Kerberos principal when writing directly to HDFS and Kerberos authentication is enabled. | 
| 
 | Optional | The path to the Keberos  | 
 | Sets the path to the Kerberos  | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | 
 | The ORC default. | Sets the block size of generated ORC files. | 
| 
 | Optional | 
 | The ORC default. | Sets the buffer size of generated ORC files. | 
| 
 | Optional | 
 | The ORC default. | Set if the ORC encoding strategy is optimized for compression or for speed.. | 
| 
 | Optional | A percentage represented as a floating point number. | The ORC default. | Sets the percentage for padding tolerance of generated ORC files. | 
| 
 | Optional | 
 | The ORC default. | Sets the row index stride of generated ORC files. | 
| 
 | Optional | 
 | The ORC default. | Sets the stripe size of generated ORC files. | 
| 
 | Optional | A unique string identifier cross referencing a child event handler. | No event handler configured. | The event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3 or HDFS. | 
| 
 | Optional | The false positive probability must be greater than
                                    zero and less than one. For example,  | The Apache ORC default. | Sets the false positive probability of the querying of a bloom filter index and the result indicating that the value being searched for is in the block, but the value is actually not in the block. needs to set which tables to set bloom filters and on which columns. The user selects on which tables and columns to set bloom filters with the following configuration syntax: gg.eventhandler.orc.bloomFilter.QASOURCE.TCUSTMER=CUST_CODE gg.eventhandler.orc.bloomFilter.QASOURCE.TCUSTORD=CUST_CODE,ORDER_DATE 
 | 
| 
 | Optional | 
 | 
 | Sets the version of the ORC bloom filter. | 
Parent topic: Using the Optimized Row Columnar Event Handler
5.4 Using the Oracle Cloud Infrastructure Event Handler
Learn how to use the Oracle Cloud Infrastructure Event Handler to load files generated by the File Writer Handler into an Oracle Cloud Infrastructure Object Store.
Topics:
- Overview
- Detailing the Functionality
- Configuring the Oracle Cloud Infrastructure Event Handler
- Configuring Credentials for Oracle Cloud Infrastructure
- Using Templated Strings
- Troubleshooting
Parent topic: Using the File Writer Handler
5.4.1 Overview
The Oracle Cloud Infrastructure Object Storage service is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. The Object Storage service can store an unlimited amount of unstructured data of any content type, including analytic data and rich content, like images and videos, see https://cloud.oracle.com/en_US/cloud-infrastructure.
You can use any format handler that the File Writer Handler supports.
Parent topic: Using the Oracle Cloud Infrastructure Event Handler
5.4.2 Detailing the Functionality
The Oracle Cloud Infrastructure Event Handler requires the Oracle Cloud Infrastructure Java software development kit (SDK) to transfer files to Oracle Cloud Infrastructure Object Storage. Oracle GoldenGate for Big Data does not include the Oracle Cloud Infrastructure Java SDK, see https://docs.cloud.oracle.com/iaas/Content/API/Concepts/sdkconfig.htm.
You must download the Oracle Cloud Infrastructure Java SDK at:
https://docs.us-phoenix-1.oraclecloud.com/Content/API/SDKDocs/javasdk.htm
Extract the JAR files to a permanent directory. There are two directories required by the handler, the JAR library directory that has Oracle Cloud Infrastructure SDK JAR and a third-party JAR library. Both directories must be in the gg.classpath.
                        
Specify the gg.classpath environment variable to include the JAR files of the Oracle Cloud Infrastructure Java SDK.
                        
Example
gg.classpath=/usr/var/oci/lib/*:/usr/var/oci/third-party/lib/*
Parent topic: Using the Oracle Cloud Infrastructure Event Handler
5.4.3 Configuring the Oracle Cloud Infrastructure Event Handler
You configure the Oracle Cloud Infrastructure Event Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
The Oracle Cloud Infrastructure Event Handler works only in conjunction with the File Writer Handler.
To enable the selection of the Oracle Cloud Infrastructure Event Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=oci and the other Oracle Cloud Infrastructure properties as follows:
                        
Table 5-4 Oracle Cloud Infrastructure Event Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the Oracle Cloud Infrastructure Event Handler. | 
| 
 | Required | Path to the event handler  | None | The configuration file name and location. | 
| 
 | Required | Valid string representing the profile name. | None | In the Oracle Cloud Infrastructure  | 
| 
 | Required | Oracle Cloud Infrastructure namespace. | None | The namespace serves as a top-level container for all buckets and objects and allows you to control bucket naming within user’s tenancy. The Object Storage namespace is a system-generated string assigned during account creation. Your namespace string is listed in Object Storage Settings while using the Oracle Cloud Infrastructure Console. | 
| 
 | Required | Oracle Cloud Infrastructure region | None | Oracle Cloud Infrastructure Servers and Data is hosted in a region and is a localized geographic area. There are four supported regions. For example: London Heathrow("uk-london-1")
Frankfurt("eu-frankfurt-1")
Ashburn("us-ashburn-1")
Phoenix("us-phoenix-1"). | 
| 
 | Required | Valid compartment id. | None | A compartment is a logical container to organize Oracle Cloud Infrastructure resources. The  | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the Oracle Cloud Infrastructure bucket to write the file. | None | Use keywords interlaced with constants to dynamically generate unique Oracle Cloud Infrastructure path names at runtime. | 
| 
 | Optional | A string with resolvable keywords and constants used to dynamically generate the Oracle Cloud Infrastructure file name at runtime. | None | Use resolvable keywords and constants to dynamically generate the Oracle Cloud Infrastructure data file name at runtime. If not set, the upstream file name is used. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the Oracle Cloud Infrastructure bucket to write the file. | None | Use resolvable keywords and constants used to dynamically generate the Oracle Cloud Infrastructure bucket name at runtime. The event handler attempts to create the Oracle Cloud Infrastructure bucket if it does not exist. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | A unique string identifier cross referencing a child event handler. | No event handler is configured. | Sets the event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3, converting to Parquet or ORC format, loading files to HDFS, loading files to Oracle Cloud Infrastructure Storage Classic, or loading file to Oracle Cloud Infrastructure. | 
Sample Configuration
gg.eventhandler.oci.type=oci gg.eventhandler.oci.configFilePath=~/.oci/config gg.eventhandler.oci.profile=DEFAULT gg.eventhandler.oci.namespace=dwcsdemo gg.eventhandler.oci.region=us-ashburn-1 gg.eventhandler.oci.compartmentID=ocid1.compartment.oc1..aaaaaaaajdg6iblwgqlyqpegf6kwdais2gyx3guspboa7fsi72tfihz2wrba gg.eventhandler.oci.pathMappingTemplate=${schemaName} gg.eventhandler.oci.bucketMappingTemplate=${schemaName} gg.eventhandler.oci.fileNameMappingTemplate=${tableName}_${currentTimestamp}.txt gg.eventhandler.oci.finalizeAction=NONE
Parent topic: Using the Oracle Cloud Infrastructure Event Handler
5.4.4 Configuring Credentials for Oracle Cloud Infrastructure
Basic configuration information like user credentials and tenancy Oracle Cloud IDs (OCIDs) of Oracle Cloud Infrastructure is required for the Java SDKs to work, see https://docs.cloud.oracle.com/iaas/Content/General/Concepts/identifiers.htm.
The ideal configuration file include keys user, fingerprint, key_file, tenancy, and region with their respective values. The default configuration file name and location is ~/.oci/config.
                        
Create the config file as follows:
                        
- 
                              Create a directory called .ociin the Oracle GoldenGate for Big Data home directory
- 
                              Create a text file and name it config.
- 
                              Obtain the values for these properties: - 
                                       user
- 
                                    - 
                                          Login to the Oracle Cloud Infrastructure Console https://console.us-ashburn-1.oraclecloud.com. 
- 
                                          Click Username. 
- 
                                          Click User Settings. The User's OCID is displayed and is the value for the key user. 
 
- 
                                          
- 
                                       tenancy
- 
                                    The Tenancy ID is displayed at the bottom of the Console page. 
- 
                                       region
- 
                                    The region is displayed with the header session drop-down menu in the Console. 
- 
                                       fingerprint
- 
                                    To generate the fingerprint, use the How to Get the Key's Fingerprint instructions at: https://docs.cloud.oracle.com/iaas/Content/API/Concepts/apisigningkey.htm 
- 
                                       key_file
- 
                                    You need to share the public and private key to establish a connection with Oracle Cloud Infrastructure. To generate the keys, use the How to Generate an API Signing Keyat: https://docs.cloud.oracle.com/iaas/Content/API/Concepts/apisigningkey.htm 
 
- 
                                       
Sample Configuration File
user=ocid1.user.oc1..aaaaaaaat5nvwcna5j6aqzqedqw3rynjq fingerprint=20:3b:97:13::4e:c5:3a:34 key_file=~/.oci/oci_api_key.pem tenancy=ocid1.tenancy.oc1..aaaaaaaaba3pv6wkcr44h25vqstifs
Parent topic: Using the Oracle Cloud Infrastructure Event Handler
5.4.5 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. This event handler makes extensive use of templated strings to generate the Oracle Cloud Infrastructure directory names, data file names, and Oracle Cloud Infrastructure bucket names. These strings give you the flexibility to select where to write data files and the names of those data files. You should exercise caution when choosing file and directory names to avoid file naming collisions that can result in an abend.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Requirements
The directory and file names generated using the templates must be legal on the system being written to. File names must be unique to avoid a file name collision. You can avoid a collision by adding a current timestamp using the ${currentTimestamp} keyword. If you are using coordinated apply, then adding ${groupName} into the data file name is recommended.
                        
Parent topic: Using the Oracle Cloud Infrastructure Event Handler
5.4.6 Troubleshooting
Connectivity Issues
If the event handler is unable to connect to the Oracle Cloud Infrastructure Classic Object when running on-premise, it’s likely your connectivity to the public internet is protected by a proxy server. Proxy servers act a gateway between the private network of a company and the public internet. Contact your network administrator to get the URLs of your proxy server, and then setup up a proxy server.
Oracle GoldenGate for Big Data can be used with a proxy server using the following Java run time arguments to enable the proxy server as in this example:
-Dhttps.proxyHost=www-proxy.us.company.com -Dhttps.proxyPort=80
ClassNotFoundException Error
The most common initial error is an incorrect classpath that does not include all the required client libraries so results in a ClassNotFoundException error. Specify the gg.classpath variable to include all of the required JAR files for the Oracle Cloud Infrastructure Java SDK, see Detailing the Functionality.
                        
Parent topic: Using the Oracle Cloud Infrastructure Event Handler
5.5 Using the Oracle Cloud Infrastructure Classic Event Handler
Learn how to use the Oracle Cloud Infrastructure Classic Event Handler to load files generated by the File Writer Handler into an Oracle Cloud Infrastructure Classic Object Store.
Topics:
- Overview
- Detailing the Functionality
- Configuring the Oracle Cloud Infrastructure Classic Event Handler
- Using Templated Strings
- Troubleshooting
Parent topic: Using the File Writer Handler
5.5.1 Overview
The Oracle Cloud Infrastructure Object Classic service is an Infrastructure as a Service (IaaS) product that provides an enterprise-grade, large-scale, object storage solution for files and unstructured data., see https://cloud.oracle.com/storage-classic.
You can use any format handler that the File Writer Handler supports.
5.5.2 Detailing the Functionality
The Oracle Cloud Infrastructure Classic Event Handler requires File Transfer Manager (FTM), a Java SDK to transfer files to Oracle Cloud Infrastructure Classic. Oracle GoldenGate for Big Data does not include the FTM Java SDK. .
You must download the FTM Java SDK at:
http://www.oracle.com/technetwork/topics/cloud/downloads/index.html#storejavasdk
Extract the JAR files to a permanent directory. There are two directories required by the handler, the JAR library directory that has Oracle Cloud Infrastructure SDK JAR and a third-party JAR library. Both directories must be in the gg.classpath.
                        
Specify the gg.classpath environment variable to include the JAR files of the FTM Java SDK.
                        
These are the required third-party JARs:
ftm-api-2.4.4.jar javax.json-1.0.4.jar slf4j-api-1.7.7.jar slf4j-log4j12-1.7.7.jar log4j-1.2.17-16.jar low-level-api-core-1.14.22.jar
Example
gg.classpath=/usr/var/ftm-sdk/libs/*:
5.5.3 Configuring the Oracle Cloud Infrastructure Classic Event Handler
You configure the Oracle Cloud Infrastructure Classic Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
The Oracle Cloud Infrastructure Classic Event Handler works only in conjunction with the File Writer Handler.
To enable the selection of the Oracle Cloud Infrastructure Classic Event Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=oci-c and the other Oracle Cloud Infrastructure properties as follows:
                        
Table 5-5 Oracle Cloud Infrastructure Classic Event Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the Oracle Cloud Infrastructure Classic Event Handler. | 
| 
 | Required | Server URL | None | The server URL for the Oracle Cloud Infrastructure Classic Event Handler. | 
| 
 | Required | Valid string representing  | None | The case-sensitive tenant id that you specify when signing in to the Oracle Cloud Infrastructure Console. | 
| 
 | Required | The Oracle Cloud Infrastructure Classic Event Handler service instance name. | None | The Oracle Cloud Infrastructure Classic Event Handler service instance name that you specified. | 
| 
 | Required | Valid user name. | None | The user name for the Oracle Cloud Infrastructure user account. | 
| 
 | Required | Valid password. | None | The password for the Oracle Cloud Infrastructure user account. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the Oracle Cloud Infrastructure bucket to write the file. | None | Use resolvable keywords and constants to dynamically generate a unique Oracle Cloud Infrastructure Classic path names at runtime. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the Oracle Cloud Infrastructure container to write the file. | None | Use resolvable keywords and constants used to dynamically generate the Oracle Cloud Infrastructure container name at runtime. The event handler attempts to create the Oracle Cloud Infrastructure container if it does not exist. | 
| 
 | Optional | A string with resolvable keywords and constants used to dynamically generate the Oracle Cloud Infrastructure file name at runtime. | None | Use resolvable keywords and constants used to dynamically generate the Oracle Cloud Infrastructure file name at runtime. If not set, the upstream file name is used. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | A unique string identifier cross referencing a child event handler. | No event handler is configured. | Sets the event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3, converting to Parquet or ORC format, loading files to HDFS, loading files to Oracle Cloud Infrastructure Classic, or loading file to Oracle Cloud Infrastructure. | 
Sample Configuration
#The OCI-C Event handler gg.eventhandler.oci-c.type=oci-c gg.eventhandler.oci-c.serverURL=https://storage.companycloud.com/ gg.eventhandler.oci-c.tenantID=usoraclebig gg.eventhandler.oci-c.serviceName=dev1 gg.eventhandler.oci-c.username=user@company.com gg.eventhandler.oci-c.password=pass gg.eventhandler.oci-c.pathMappingTemplate=${schemaName} gg.eventhandler.oci-c.containerMappingTemplate=${schemaName} gg.eventhandler.oci-c.fileNameMappingTemplate=${tableName}_${currentTimestamp}.json gg.eventhandler.oci-c.finalizeAction=NONE
5.5.4 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. This event handler makes extensive use of templated strings to generate the Oracle Cloud Infrastructure directory names, data file names, and Oracle Cloud Infrastructure bucket names. These strings give you the flexibility to select where to write data files and the names of those data files. You should exercise caution when choosing file and directory names to avoid file naming collisions that can result in an abend.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Requirements
The directory and file names generated using the templates must be legal on the system being written to. File names must be unique to avoid a file name collision. You can avoid a collision by adding a current timestamp using the ${currentTimestamp} keyword. If you are using coordinated apply, then adding ${groupName} into the data file name is recommended.
                        
5.5.5 Troubleshooting
Connectivity Issues
If the event handler is unable to connect to the Oracle Cloud Infrastructure Classic Object when running on-premise, it’s likely your connectivity to the public internet is protected by a proxy server. Proxy servers act a gateway between the private network of a company and the public internet. Contact your network administrator to get the URLs of your proxy server, and then setup up a proxy server.
Oracle GoldenGate for Big Data can be used with a proxy server using the following Java run time arguments to enable the proxy server as in this example:
-Dhttps.proxyHost=www-proxy.us.company.com -Dhttps.proxyPort=80
ClassNotFoundException Error
The most common initial error is an incorrect classpath that does not include all the required client libraries so results in a ClassNotFoundException error. Specify the gg.classpath variable to include all of the required JAR files for the Oracle Cloud Infrastructure Java SDK, see Detailing the Functionality.
                        
5.6 Using the Parquet Event Handler
Learn how to use the Parquet Event Handler to load files generated by the File Writer Handler into HDFS.
See Using the File Writer Handler.
Topics:
Parent topic: Using the File Writer Handler
5.6.1 Overview
The Parquet Event Handler enables you to generate data files in Parquet format. Parquet files can be written to either the local file system or directly to HDFS. Parquet is a columnar data format that can substantially improve data retrieval times and improve the performance of Big Data analytics, see https://parquet.apache.org/.
Parent topic: Using the Parquet Event Handler
5.6.2 Detailing the Functionality
Topics:
- Configuring the Parquet Event Handler to Write to HDFS
- About the Upstream Data Format
- Using Templated Strings
Parent topic: Using the Parquet Event Handler
5.6.2.1 Configuring the Parquet Event Handler to Write to HDFS
The Apache Parquet framework supports writing directly to HDFS. The Parquet Event Handler can write Parquet files directly to HDFS. These additional configuration steps are required:
The Parquet Event Handler dependencies and considerations are the same as the HDFS Handler, see HDFS Additional Considerations.
Set the writeToHDFS property to true:
                           
gg.eventhandler.parquet.writeToHDFS=true
Ensure that gg.classpath includes the HDFS client libraries.
                           
Ensure that the directory containing the HDFS core-site.xml file is in gg.classpath. This is so the core-site.xml file can be read at runtime and the connectivity information to HDFS can be resolved. For example: 
                           
gg.classpath=/{HDFS_install_directory}/etc/hadoopIf Kerberos authentication is enabled on the HDFS cluster, you have to configure the Kerberos principal and the location of the keytab file so that the password can be resolved at runtime:
                           
gg.eventHandler.name.kerberosPrincipal=principal
gg.eventHandler.name.kerberosKeytabFile=path_to_the_keytab_file
Parent topic: Detailing the Functionality
5.6.2.2 About the Upstream Data Format
The Parquet Event Handler can only convert Avro Object Container File (OCF) generated by the File Writer Handler. The Parquet Event Handler cannot convert other formats to Parquet data files. The format of the File Writer Handler must be avro_row_ocf or avro_op_ocf, see Using the File Writer Handler.
                           
Parent topic: Detailing the Functionality
5.6.2.3 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The Parquet Event Handler makes extensive use of templated strings to generate the HDFS directory names, data file names, and HDFS bucket names. This gives you the flexibility to select where to write data files and the names of those data files.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Parent topic: Detailing the Functionality
5.6.3 Configuring the Parquet Event Handler
You configure the Parquet Event Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
The Parquet Event Handler works only in conjunction with the File Writer Handler.
To enable the selection of the Parquet Event Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=parquet and the other Parquet Event properties as follows:
                        
Table 5-6 Parquet Event Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the Parquet Event Handler for use. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path to write generated Parquet files. | None | Use keywords interlaced with constants to dynamically generate a unique path names at runtime. Typically, path names follow the format,  | 
| 
 | Optional | A string with resolvable keywords and constants used to dynamically generate the Parquet file name at runtime | None | Sets the Parquet file name. If not set, the upstream file name is used. | 
| 
 | Optional | 
 | 
 | Sets the compression codec of the generated Parquet file. | 
| 
 | Optional | 
 | 
 | Indicates what the Parquet Event Handler should do at the finalize action. | 
| 
 | Optional | 
 | The Parquet default. | Set to  | 
| 
 | Optional | 
 | The Parquet default. | Set to  | 
| 
 | Optional | Integer | The Parquet default. | Sets the Parquet dictionary page size. | 
| 
 | Optional | Integer | The Parquet default. | Sets the Parquet padding size. | 
| 
 | Optional | Integer | The Parquet default. | Sets the Parquet page size. | 
| 
 | Optional | Integer | The Parquet default. | Sets the Parquet row group size. | 
| 
 | Optional | The Kerberos principal name. | None | Set to the Kerberos principal when writing directly to HDFS and Kerberos authentication is enabled. | 
| 
 | Optional | The path to the Keberos  | The Parquet default. | Set to the path to the Kerberos  | 
| 
 | Optional | A unique string identifier cross referencing a child event handler. | No event handler configured. | The event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS. | 
Parent topic: Using the Parquet Event Handler
5.7 Using the S3 Event Handler
Learn how to use the S3 Event Handler, which provides the interface to Amazon S3 web services.
Topics:
Parent topic: Using the File Writer Handler
5.7.1 Overview
Amazon S3 is object storage hosted in the Amazon cloud. The purpose of the S3 Event Handler is to load data files generated by the File Writer Handler into Amazon S3, see https://aws.amazon.com/s3/.
You can use any format that the File Writer Handler, see Using the File Writer Handler.
Parent topic: Using the S3 Event Handler
5.7.2 Detailing Functionality
The S3 Event Handler requires the Amazon Web Services (AWS) Java SDK to transfer files to S3 object storage.Oracle GoldenGate for Big Data does not include the AWS Java SDK. You have to download and install the AWS Java SDK from:
https://aws.amazon.com/sdk-for-java/
Then you have to configure the gg.classpath variable to include the JAR files in the AWS Java SDK and are divided into two directories. Both directories must be in gg.classpath, for example:
                        
gg.classpath=/usr/var/aws-java-sdk-1.11.240/lib/*:/usr/var/aws-java-sdk-1.11.240/third-party/lib/
Topics:
- Configuring the Client ID and Secret
- About the AWS S3 Buckets
- Using Templated Strings
- Troubleshooting
Parent topic: Using the S3 Event Handler
5.7.2.1 Configuring the Client ID and Secret
A client ID and secret are the required credentials for the S3 Event Handler to interact with Amazon S3. A client ID and secret are generated using the Amazon AWS website. The retrieval of these credentials and presentation to the S3 server are performed on the client side by the AWS Java SDK. The AWS Java SDK provides multiple ways that the client ID and secret can be resolved at runtime.
The client ID and secret can be set as Java properties, on one line, in the Java Adapter properties file as follows:
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar -Daws.accessKeyId=your_access_key -Daws.secretKey=your_secret_keyThis sets environmental variables using the Amazon Elastic Compute Cloud (Amazon EC2) AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables on the local machine.
                           
Parent topic: Detailing Functionality
5.7.2.2 About the AWS S3 Buckets
AWS divides S3 storage into separate file systems called buckets. The S3 Event Handler can write to pre-created buckets. Alternatively, if the S3 bucket does not exist, the S3 Event Handler attempts to create the specified S3 bucket. AWS requires that S3 bucket names are lowercase. Amazon S3 bucket names must be globally unique. If you attempt to create an S3 bucket that already exists in any Amazon account, it causes the S3 Event Handler to abend.
Parent topic: Detailing Functionality
5.7.2.3 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The S3 Event Handler makes extensive use of templated strings to generate the S3 directory names, data file names, and S3 bucket names. This gives you the flexibility to select where to write data files and the names of those data files.
Supported Templated Strings
| Keyword | Description | 
|---|---|
| ${fullyQualifiedTableName}  | The fully qualified source table name delimited by a period ( | 
| ${catalogName}  | The individual source catalog name. For example,  | 
| ${schemaName}  | The individual source schema name.  For example,  | 
| ${tableName}  | The individual source table name.  For example,  | 
| ${groupName}  | The name of the Replicat process (with the thread number appended if you’re using coordinated apply). | 
| ${emptyString}  | Evaluates to an empty string.  For example, | 
| ${operationCount} | The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${insertCount}  | The total count of insert operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${updateCount} | The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${deleteCount} | The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( | 
| ${truncateCount} | The total count of truncate operations in the data file.  It must be used either on rename or by the event handlers or it will be zero ( | 
| ${currentTimestamp} | The current timestamp.  The default output format for the date time is  
 This format uses the syntax defined in the Java  | 
| ${toUpperCase[]} | Converts the contents inside the square brackets to uppercase. For example,  | 
| ${toLowerCase[]} | Converts the contents inside the square brackets to lowercase. For example,  | 
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Parent topic: Detailing Functionality
5.7.2.4 Troubleshooting
Connectivity Issues
If the S3 Event Handler is unable to connect to the S3 object storage when running on premise, it’s likely your connectivity to the public internet is protected by a proxy server. Proxy servers act a gateway between the private network of a company and the public internet. Contact your network administrator to get the URLs of your proxy server, and then setup up a proxy server.
Oracle GoldenGate can be used with a proxy server using the following parameters to enable the proxy server:
- gg.handler.name.proxyServer=
- 
                                 gg.handler.name.proxyPort=80
Access to the proxy servers can be secured using credentials and the following configuration parameters:
- gg.handler.name.proxyUsername=username
- gg.handler.name.proxyPassword=password
Sample configuration:
gg.eventhandler.s3.type=s3
gg.eventhandler.s3.region=us-west-2
gg.eventhandler.s3.proxyServer=www-proxy.us.oracle.com
gg.eventhandler.s3.proxyPort=80
gg.eventhandler.s3.proxyProtocol=HTTP
gg.eventhandler.s3.bucketMappingTemplate=yourbucketname
gg.eventhandler.s3.pathMappingTemplate=thepath
gg.eventhandler.s3.finalizeAction=noneParent topic: Detailing Functionality
5.7.3 Configuring the S3 Event Handler
You configure the S3 Event Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the S3 Event Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=s3 and the other S3 Event properties as follows:
                        
Table 5-7 S3 Event Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation | 
|---|---|---|---|---|
| 
 | Required | 
 | None | Selects the S3 Event Handler for use with Replicat. | 
| 
 | Required | The AWS region name that is hosting your S3 instance. | None | Setting the legal AWS region name is required. | 
| 
 | Optional | The host name of your proxy server. | None | Sets the host name of your proxy server if connectivity to AWS is required use a proxy server. | 
| 
 | Optional | The port number of the proxy server. | None | Sets the port number of the proxy server if connectivity to AWS is required use a proxy server. | 
| 
 | Optional | The username of the proxy server. | None | Sets the user name of the proxy server if connectivity to AWS is required use a proxy server and the proxy server requires credentials. | 
| 
 | Optional | The password of the proxy server. | None | Sets the password for the user name of the proxy server if connectivity to AWS is required use a proxy server and the proxy server requires credentials. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the S3 bucket to write the file. | None | Use resolvable keywords and constants used to dynamically generate the S3 bucket name at runtime. The handler attempts to create the S3 bucket if it does not exist. AWS requires bucket names to be all lowercase. A bucket name with uppercase characters results in a runtime exception. | 
| 
 | Required | A string with resolvable keywords and constants used to dynamically generate the path in the S3 bucket to write the file. | None | Use keywords interlaced with constants to dynamically generate a unique S3 path names at runtime. Typically, path names follow the format,  | 
| 
 | Optional | A string with resolvable keywords and constants used to dynamically generate the S3 file name at runtime. | None | Use resolvable keywords and constants used to dynamically generate the S3 data file name at runtime. If not set, the upstream file name is used. | 
| 
 | Optional | 
 | 
 | Set to  | 
| 
 | Optional | A unique string identifier cross referencing a child event handler. | No event handler configured. | Sets the event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS. | 
| 
 | Optional(unless Dell ECS, then required) | A legal URL to connect to cloud storage. | None | Not required for Amazon AWS S3. Required for Dell ECS. Sets the URL to connect to cloud storage. | 
| 
 | Optional | 
 | 
 | Sets the proxy protocol connection to the proxy server for additional level of security. The client first performs an SSL handshake with the proxy server, and then an SSL handshake with Amazon AWS. This feature was added into the Amazon SDK in version 1.11.396 so you must use at least that version to use this property. | 
| 
 | Optional | 
 | Empty | Set only if you are enabling S3 server side encryption. Use the parameters to set the algorithm for server side encryption in S3. | 
| 
 | Optional | A legal AWS key management system server side management key or the alias that represents that key. | Empty | Set only if you are enabling S3 server side encryption and the S3
                                    algorithm is  | 
Parent topic: Using the S3 Event Handler