9.2.9 Apache HDFS
The HDFS Handler is designed to stream change capture data into the Hadoop Distributed File System (HDFS).
This chapter describes how to use the HDFS Handler.
- Overview
- Writing into HDFS in SequenceFile Format
The HDFSSequenceFile
is a flat file consisting of binary key and value pairs. You can enable writing data inSequenceFile
format by setting thegg.handler.name.format
property tosequencefile
. - Setting Up and Running the HDFS Handler
- Writing in HDFS in Avro Object Container File Format
- Generating HDFS File Names Using Template Strings
- Metadata Change Events
- Partitioning
The partitioning functionality uses the template mapper functionality to resolve partitioning strings. The result is that the you have more control in how to partition source trail data. Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 21c release, all the keywords that are supported by the templating functionality are supported in HDFS partitioning. - HDFS Additional Considerations
- Best Practices
- Troubleshooting the HDFS Handler
Troubleshooting of the HDFS Handler begins with the contents for the Javalog4j
file. Follow the directions in the Java Logging Configuration to configure the runtime to correctly generate the Javalog4j
log file. - HDFS Handler Client Dependencies
Parent topic: Target
9.2.9.1 Overview
The HDFS is the primary file system for Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA). Hadoop is typically installed on multiple machines that work together as a Hadoop cluster. Hadoop allows you to store very large amounts of data in the cluster that is horizontally scaled across the machines in the cluster. You can then perform analytics on that data using a variety of GG for DAA applications.
Parent topic: Apache HDFS
9.2.9.2 Writing into HDFS in SequenceFile Format
The HDFS SequenceFile
is a flat file consisting of binary key and
value pairs. You can enable writing data in SequenceFile
format by setting
the gg.handler.name.format
property to
sequencefile
.
The key
part of the record is set to null, and the actual data is
set in the value
part. For information about Hadoop
SequenceFile
, see https://cwiki.apache.org/confluence/display/HADOOP2/SequenceFile.
Parent topic: Apache HDFS
9.2.9.2.1 Integrating with Hive
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release does not include a Hive storage handler because the HDFS Handler provides all of the necessary Hive functionality.
You can create a Hive integration to create tables and update table definitions in case of DDL events. This is limited to data formatted in Avro Object Container File format. For more information, see Writing in HDFS in Avro Object Container File Format and HDFS Handler Configuration.
For Hive to consume sequence files, the DDL creates Hive tables including STORED as sequencefile
. The following is a sample create table
script:
CREATE EXTERNAL TABLE table_name (
col1 string,
...
...
col2 string)
ROW FORMAT DELIMITED
STORED as sequencefile
LOCATION '/path/to/hdfs/file';
Note:
If files are intended to be consumed by Hive, then the gg.handler.name.partitionByTable
property should be set to true
.
Parent topic: Writing into HDFS in SequenceFile Format
9.2.9.2.2 Understanding the Data Format
The data written in the value
part of each record and is in delimited text format. All of the options described in the Using the Delimited Text Row Formatter section are applicable to HDFS SequenceFile when writing data to it.
For example:
gg.handler.name.format=sequencefile
gg.handler.name.format.includeColumnNames=true
gg.handler.name.format.includeOpType=true
gg.handler.name.format.includeCurrentTimestamp=true
gg.handler.name.format.updateOpKey=U
Parent topic: Writing into HDFS in SequenceFile Format
9.2.9.3 Setting Up and Running the HDFS Handler
To run the HDFS Handler, a Hadoop single instance or Hadoop cluster must be installed, running, and network-accessible from the machine running the HDFS Handler. Apache Hadoop is open source and you can download it from:
Follow the Getting Started links for information on how to install a single-node cluster (for pseudo-distributed operation mode) or a clustered setup (for fully-distributed operation mode).
Instructions for configuring the HDFS Handler components and running the handler are described in the following sections.
- Classpath Configuration
- HDFS Handler Configuration
- Review a Sample Configuration
- Performance Considerations
- Security
Parent topic: Apache HDFS
9.2.9.3.1 Classpath Configuration
For the HDFS Handler to connect to HDFS and run, the HDFS core-site.xml
file and the HDFS client jars must be configured in gg.classpath
variable. The HDFS client jars must match the version of HDFS that the HDFS Handler is connecting. For a list of the required client jar files by release, see HDFS Handler Client Dependencies.
The default location of the core-site.xml
file is Hadoop_Home
/etc/hadoop
The default locations of the HDFS client jars are the following directories:
Hadoop_Home
/share/hadoop/common/lib/*
Hadoop_Home
/share/hadoop/common/*
Hadoop_Home
/share/hadoop/hdfs/lib/
*
Hadoop_Home
/share/hadoop/hdfs/*
The gg.classpath
must be configured exactly as shown. The path to the core-site.xml
file must contain the path to the directory containing the core-site.xml
file with no wildcard appended. If you include a (*) wildcard in the path to the core-site.xml
file, the file is not picked up. Conversely, the path to the dependency jars must include the (*) wildcard character in order to include all the jar files in that directory in the associated classpath. Do not use *.jar
.
The following is an example of a correctly configured gg.classpath
variable:
gg.classpath=/ggwork/hadoop/hadoop-2.6.0/etc/hadoop:/ggwork/hadoop/hadoop-2.6.0/share/hadoop/common/lib/*:/ggwork/hadoop/hadoop-2.6.0/share/hadoop/common/*:/ggwork/hadoop/hadoop-2.6.0/share/hadoop/hdfs/*:/ggwork/hadoop/hadoop-2.6.0/share/hadoop/hdfs/lib/*
The HDFS configuration file hdfs-site.xml
must also be in the classpath if Kerberos security is enabled. By default, the hdfs-site.xml
file is located in the Hadoop_Home
/etc/hadoop
directory. If the HDFS Handler is not collocated with Hadoop, either or both files can be copied to another machine.
Parent topic: Setting Up and Running the HDFS Handler
9.2.9.3.2 HDFS Handler Configuration
The following are the configurable values for the HDFS Handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the HDFS Handler, you must first configure the handler
type by specifying gg.handler.name.type=hdfs
and the other HDFS
properties as follows:
Property | Optional / Required | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
Any string |
None |
Provides a name for the HDFS Handler. The HDFS Handler name then becomes part of the property names listed in this table. |
|
Required |
|
None |
Selects the HDFS Handler for streaming change data capture into HDFS. |
|
Optional |
|
|
Selects operation ( |
|
Optional |
The default unit of measure is bytes. You can use |
|
Selects the maximum file size of the created HDFS files. |
|
Optional |
Any legal templated string to resolve the target write directory in HDFS. Templates can contain a mix of constants and keywords which are dynamically resolved at runtime to generate the HDFS write directory. |
|
You can use keywords interlaced with constants to dynamically generate the HDFS write directory at runtime, see Generating HDFS File Names Using Template Strings. |
|
Optional |
The default unit of measure is milliseconds. You can stipulate |
File rolling on time is off. |
The timer starts when an HDFS file is created. If the file is still open when the interval elapses, then the file is closed. A new file is not immediately opened. New HDFS files are created on a just-in-time basis. |
|
Optional |
The default unit of measure is milliseconds. You can use |
File inactivity rolling on time is off. |
The timer starts from the latest write to an HDFS file. New writes to an HDFS file restart the counter. If the file is still open when the counter elapses, the HDFS file is closed. A new file is not immediately opened. New HDFS files are created on a just-in-time basis. |
|
Optional |
A string with resolvable keywords and constants used to dynamically generate HDFS file names at runtime. |
|
You can use keywords interlaced with constants to dynamically generate unique HDFS file names at runtime, see Generating HDFS File Names Using Template Strings. File names typically follow the format, |
|
Optional |
|
|
Determines whether data written into HDFS must be partitioned by table. If set to Must be set to |
|
Optional |
|
|
Determines whether HDFS files are rolled in the case of a metadata change. True means the HDFS file is rolled, false means the HDFS file is not rolled. Must be set to |
|
Optional |
|
|
Selects the formatter for the HDFS Handler for how output data is formatted.
|
|
Optional |
|
|
Set to |
|
Optional |
A mixture of templating keywords and constants to resolve a sub directory at runtime to partition the data. |
|
The configuration resolves a sub directory or sub directories, which are appended to
the resolved HDFS target path. These sub
directories are used to partition the data.
|
|
Optional |
kerberos |
|
Setting this property to |
|
Optional (Required if |
Relative or absolute path to a Kerberos |
|
The |
|
Optional (Required if |
A legal Kerberos principal name like |
|
The Kerberos principal name for Kerberos authentication. |
|
Optional |
- |
|
Set to a legal path in HDFS so that schemas (if available) are written in that HDFS directory. Schemas are currently only available for Avro and JSON formatters. In the case of a metadata change event, the schema is overwritten to reflect the schema change. |
Applicable to Sequence File Format only. |
Optional |
|
|
Hadoop Sequence File Compression Type. Applicable only if |
Applicable to Sequence File and writing to HDFS is Avro OCF formats only. |
Optional |
|
|
Hadoop Sequence File Compression Codec. Applicable only if |
gg.handler.name.compressionCodec |
Optional |
|
|
Avro OCF Formatter Compression Code. This configuration controls the selection of the compression library to be used for Avro OCF files. Snappy includes native binaries in the Snappy JAR file and performs a Java-native traversal when compressing or decompressing. Use of Snappy may introduce runtime issues and platform porting issues that you may not experience when working with Java. You may need to perform additional testing to ensure that Snappy works on all of your required platforms. Snappy is an open source library, so Oracle cannot guarantee its ability to operate on all of your required platforms. |
|
Optional |
|
|
Applicable only to the HDFS Handler that is not writing an Avro OCF or sequence file to support extract, load, transform (ELT) situations. When set to File rolls can be triggered by any one of the following:
Data files are being loaded into HDFS and a monitor program is monitoring the write directories waiting to consume the data. The monitoring programs use the appearance of a new file as a trigger so that the previous file can be consumed by the consuming application. |
|
Optional |
|
|
Set to use an Setting For most applications setting this property to |
Parent topic: Setting Up and Running the HDFS Handler
9.2.9.3.3 Review a Sample Configuration
The following is a sample configuration for the HDFS Handler from the Java Adapter properties file:
gg.handlerlist=hdfs gg.handler.hdfs.type=hdfs gg.handler.hdfs.mode=tx gg.handler.hdfs.includeTokens=false gg.handler.hdfs.maxFileSize=1g gg.handler.hdfs.pathMappingTemplate=/ogg/${fullyQualifiedTableName} gg.handler.hdfs.fileRollInterval=0 gg.handler.hdfs.inactivityRollInterval=0 gg.handler.hdfs.partitionByTable=true gg.handler.hdfs.rollOnMetadataChange=true gg.handler.hdfs.authType=none gg.handler.hdfs.format=delimitedtext
Parent topic: Setting Up and Running the HDFS Handler
9.2.9.3.4 Performance Considerations
The HDFS Handler calls the HDFS flush method on the HDFS write stream to flush data to the HDFS data nodes at the end of each transaction in order to maintain write durability. This is an expensive call and performance can adversely affect, especially in the case of transactions of one or few operations that result in numerous HDFS flush calls.
Performance of the HDFS Handler can be greatly improved by batching multiple small transactions into a single larger transaction. If you require high performance, configure batching functionality for the Replicat process. For more information, see Replicat Grouping.
The HDFS client libraries spawn threads for every HDFS file stream opened by the HDFS Handler. Therefore, the number of threads executing in the JMV grows proportionally to the number of HDFS file streams that are open. Performance of the HDFS Handler may degrade as more HDFS file streams are opened. Configuring the HDFS Handler to write to many HDFS files (due to many source replication tables or extensive use of partitioning) may result in degraded performance. If your use case requires writing to many tables, then Oracle recommends that you enable the roll on time or roll on inactivity features to close HDFS file streams. Closing an HDFS file stream causes the HDFS client threads to terminate, and the associated resources can be reclaimed by the JVM.
Parent topic: Setting Up and Running the HDFS Handler
9.2.9.3.5 Security
The HDFS cluster can be secured using Kerberos authentication. The HDFS Handler can connect to Kerberos secured cluster. The HDFS core-site.xml
should be in the handlers classpath with the hadoop.security.authentication
property set to kerberos
and the hadoop.security.authorization
property set to true
. Additionally, you must set the following properties in the HDFS Handler Java configuration file:
gg.handler.name
.authType=kerberos gg.handler.name
.kerberosPrincipalName=legal Kerberos principal name gg.handler.name
.kerberosKeytabFile=path to a keytab file that contains the password for the Kerberos principal so that the HDFS Handler can programmatically perform the Kerberos kinit operations to obtain a Kerberos ticket
You may encounter the inability to decrypt the Kerberos password from the keytab
file. This causes the Kerberos authentication to fall back to interactive mode which cannot work because it is being invoked programmatically. The cause of this problem is that the Java Cryptography Extension (JCE) is not installed in the Java Runtime Environment (JRE). Ensure that the JCE is loaded in the JRE, see http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html.
Parent topic: Setting Up and Running the HDFS Handler
9.2.9.4 Writing in HDFS in Avro Object Container File Format
The HDFS Handler includes specialized functionality to write to HDFS in Avro Object Container File (OCF) format. This Avro OCF is part of the Avro specification and is detailed in the Avro documentation at:
https://avro.apache.org/docs/current/spec.html#Object+Container+Files
Avro OCF format may be a good choice because it:
-
integrates with Apache Hive (Raw Avro written to HDFS is not supported by Hive.)
-
provides good support for schema evolution.
Configure the following to enable writing to HDFS in Avro OCF format:
To write row data to HDFS in Avro OCF format, configure the gg.handler.name.format=avro_row_ocf
property.
To write operation data to HDFS is Avro OCF format, configure the gg.handler.name.format=avro_op_ocf
property.
The HDFS and Avro OCF integration includes functionality to create the corresponding tables in Hive and update the schema for metadata change events. The configuration section provides information on the properties to enable integration with Hive. The Oracle GoldenGate Hive integration accesses Hive using the JDBC interface, so the Hive JDBC server must be running to enable this integration.
Parent topic: Apache HDFS
9.2.9.5 Generating HDFS File Names Using Template Strings
The HDFS Handler can dynamically generate HDFS file names using a template
string. The template string allows you to generate a combination of keywords that are
dynamically resolved at runtime with static strings to provide you more control of
generated HDFS file names. You can control the template file name using the
gg.handler.name.fileNameMappingTemplate
configuration
property. The default value for this parameters is:
${fullyQualifiedTableName}_${groupName}_${currentTimestamp}.txt
See Template Keywords.
Following are examples of legal templates and the resolved strings:
- Legal Template
-
Replacement
-
${schemaName}.${tableName}__${groupName}_${currentTimestamp}.txt
TEST.TABLE1__HDFS001_2017-07-05_04-31-23.123.txt
-
${fullyQualifiedTableName}--${currentTimestamp}.avro
ORACLE.TEST.TABLE1—2017-07-05_04-31-23.123.avro
-
${fullyQualifiedTableName}_${currentTimestamp[yyyy-MM-ddTHH-mm-ss.SSS]}.json
ORACLE.TEST.TABLE1_2017-07-05T04-31-23.123.json
Be aware of these restrictions when generating HDFS file names using templates:
- Generated HDFS file names must be legal HDFS file names.
- Oracle strongly recommends that you use
${groupName}
as part of the HDFS file naming template when using coordinated apply and breaking down source table data to different Replicat threads. The group name provides uniqueness of generated HDFS names that${currentTimestamp}
alone does not guarantee. HDFS file name collisions result in an abend of the Replicat process.
Parent topic: Apache HDFS
9.2.9.6 Metadata Change Events
Metadata change events are now handled in the HDFS Handler. The default behavior of the HDFS Handler is to roll the current relevant file in the event of a metadata change event. This behavior allows for the results of metadata changes to at least be separated into different files. File rolling on metadata change is configurable and can be turned off.
To support metadata change events, the process capturing changes in the source database must support both DDL changes and metadata in trail. Oracle GoldenGatedoes not support DDL replication for all database implementations. See the Oracle GoldenGateinstallation and configuration guide for the appropriate database to determine whether DDL replication is supported.
Parent topic: Apache HDFS
9.2.9.7 Partitioning
The partitioning functionality uses the template mapper functionality to resolve partitioning strings. The result is that the you have more control in how to partition source trail data. Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 21c release, all the keywords that are supported by the templating functionality are supported in HDFS partitioning.
Precondition
To use the partitioning functionality, ensure that the data is partitioned by the table. You cannot set the following configuration:
gg.handler.name.partitionByTable=false
Path Configuration
Assume that the path mapping template is configured as follows:
gg.handler.hdfs.pathMappingTemplate=/ogg/${fullyQualifiedTableName}
At runtime the path resolves as follows for the source table
DBO.ORDERS
:
/ogg/DBO.ORDERS
Partitioning Configuration
Configure the HDFS partitioning as follows; any of the keywords that are legal for templating are now legal for partitioning:
gg.handler.name.partitioner.fully qualified table name=templating keywords and/or
constants
DBO.ORDERS
table is set to the following:
gg.handler.hdfs.partitioner.DBO.ORDERS=par_sales_region=${columnValue[SALES_REGION]}
/ogg/DBO.ORDERS/par_sales_region=west/data files
/ogg/DBO.ORDERS/par_sales_region=east/data files
/ogg/DBO.ORDERS/par_sales_region=north/data files
/ogg/DBO.ORDERS/par_sales_region=south/data files
DBO.ORDERS
table is set to the
following:gg.handler.hdfs.partitioner.DBO.ORDERS=par_sales_region=${columnValue[SALES_REGION]}/par_state=${columnValue[STATE]}
This example can result in the following breakdown of files in HDFS:
/ogg/DBO.ORDERS/par_sales_region=west/par_state=CA/data files
/ogg/DBO.ORDERS/par_sales_region=east/par_state=FL/data files
/ogg/DBO.ORDERS/par_sales_region=north/par_state=MN/data files
/ogg/DBO.ORDERS/par_sales_region=south/par_state=TX/data files
Ensure to be extra vigilant while configuring HDFS partitioning. If you choose partitioning column values that have a very large range of data values, then it results in partitioning to a proportional number of output data files. The HDFS client spawns multiple threads to service each open HDFS write stream. Partitioning to very large numbers of HDFS files can result in resource exhaustion of memory and/or threads.
Note:
Starting GG for DAA 21c, the Automated Hive integration has been removed with the changes to support templating in control partitioning.Parent topic: Apache HDFS
9.2.9.8 HDFS Additional Considerations
The Oracle HDFS Handler requires certain HDFS client libraries to be resolved in its classpath as a prerequisite for streaming data to HDFS.
For a list of required client JAR files by version, see HDFS Handler Client Dependencies. The HDFS client jars do not ship with the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) product. The HDFS Handler supports multiple versions of HDFS, and the HDFS client jars must be the same version as the HDFS version to which the HDFS Handler is connecting. The HDFS client jars are open source and are freely available to download from sites such as the Apache Hadoop site or the maven central repository.
In order to establish connectivity to HDFS, the HDFS core-site.xml
file must be in the classpath of the HDFS Handler. If the core-site.xml
file is not in the classpath, the HDFS client code defaults to a mode that attempts to write to the local file system. Writing to the local file system instead of HDFS can be advantageous for troubleshooting, building a point of contact (POC), or as a step in the process of building an HDFS integration.
Another common issue is that data streamed to HDFS using the HDFS Handler may not be
immediately available to GG for DAA analytic tools, such as Hive. This behavior
commonly occurs when the HDFS Handler is in possession of an open write stream to an
HDFS file. HDFS writes in blocks of 128 MB by default. HDFS blocks under
construction are not always visible to analytic tools. Additionally, inconsistencies
between file sizes when using the -ls
, -cat
, and
-get
commands in the HDFS shell may occur. This is an anomaly
of HDFS streaming and is discussed in the HDFS specification. This anomaly of HDFS
leads to a potential 128 MB per file blind spot in analytic data. This may not be an
issue if you have a steady stream of replication data and do not require low levels
of latency for analytic data from HDFS. However, this may be a problem in some use
cases because closing the HDFS write stream finalizes the block writing. Data is
immediately visible to analytic tools, and file sizing metrics become consistent
again. Therefore, the new file rolling feature in the HDFS Handler can be used to
close HDFS writes streams, making all data visible.
Important:
The file rolling solution may present its own problems. Extensive use of file rolling can result in many small files in HDFS. Many small files in HDFS may result in performance issues in analytic tools.
You may also notice the HDFS inconsistency problem in the following scenarios.
-
The HDFS Handler process crashes.
-
A forced shutdown is called on the HDFS Handler process.
-
A network outage or other issue causes the HDFS Handler process to abend.
In each of these scenarios, it is possible for the HDFS Handler to end without explicitly closing the HDFS write stream and finalizing the writing block. HDFS in its internal process ultimately recognizes that the write stream has been broken, so HDFS finalizes the write block. In this scenario, you may experience a short term delay before the HDFS process finalizes the write block.
Parent topic: Apache HDFS
9.2.9.9 Best Practices
It is considered a Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) best practice for the HDFS cluster to operate on dedicated servers called cluster nodes. Edge nodes are server machines that host the applications to stream data to and retrieve data from the HDFS cluster nodes. Because the HDFS cluster nodes and the edge nodes are different servers, the following benefits are seen:
-
The HDFS cluster nodes do not compete for resources with the applications interfacing with the cluster.
-
The requirements for the HDFS cluster nodes and edge nodes probably differ. This physical topology allows the appropriate hardware to be tailored to specific needs.
It is a best practice for the HDFS Handler to be installed and running on an edge node and streaming data to the HDFS cluster using network connection. The HDFS Handler can run on any machine that has network visibility to the HDFS cluster. The installation of the HDFS Handler on an edge node requires that the core-site.xml
files, and the dependency jars are copied to the edge node so that the HDFS Handler can access them. The HDFS Handler can also run collocated on a HDFS cluster node if required.
Parent topic: Apache HDFS
9.2.9.10 Troubleshooting the HDFS Handler
Troubleshooting of the HDFS Handler begins with the contents for the Java
log4j
file. Follow the directions in the Java Logging Configuration to
configure the runtime to correctly generate the Java log4j
log
file.
Parent topic: Apache HDFS
9.2.9.10.1 Java Classpath
Problems with the Java classpath are common. The usual indication of a Java classpath problem is a ClassNotFoundException
in the Java log4j
log file. The Java log4j
log file can be used to troubleshoot this issue. Setting the log level to DEBUG
allows for logging of each of the jars referenced in the gg.classpath
object to be logged to the log file. In this way, you can ensure that all of the required dependency jars are resolved by enabling DEBUG
level logging and search the log file for messages, as in the following:
2015-09-21 10:05:10 DEBUG ConfigClassPath:74 - ...adding to classpath: url="file:/ggwork/hadoop/hadoop-2.6.0/share/hadoop/common/lib/guava-11.0.2.jar
Parent topic: Troubleshooting the HDFS Handler
9.2.9.10.2 Java Boot Options
When running HDFS replicat with JRE 11, StackOverflowError
is
thrown. You can fix this issue by editing the bootoptions property in the Java Adapter
Properties file as follows:
jvm.bootoptions=-Djdk.lang.processReaperUseDefaultStackSize=true
Parent topic: Troubleshooting the HDFS Handler
9.2.9.10.3 HDFS Connection Properties
The contents of the HDFS core-site.xml
file (including default settings) are output to the Java log4j
log file when the logging level is set to DEBUG
or TRACE
. This output shows the connection properties to HDFS. Search for the following in the Java log4j
log file:
2015-09-21 10:05:11 DEBUG HDFSConfiguration:58 - Begin - HDFS configuration object contents for connection troubleshooting.
If the fs.defaultFS
property points to the local file system, then the core-site.xml
file is not properly set in the gg.classpath
property.
Key: [fs.defaultFS] Value: [file:///].
This shows to the fs.defaultFS
property properly pointed at and HDFS host and port.
Key: [fs.defaultFS] Value: [hdfs://hdfshost:9000].
Parent topic: Troubleshooting the HDFS Handler
9.2.9.10.4 Handler and Formatter Configuration
The Java log4j
log file contains information on the configuration state of the HDFS Handler and the selected formatter. This information is output at the INFO
log level. The output resembles the following:
2015-09-21 10:05:11 INFO AvroRowFormatter:156 - **** Begin Avro Row Formatter - Configuration Summary **** Operation types are always included in the Avro formatter output. The key for insert operations is [I]. The key for update operations is [U]. The key for delete operations is [D]. The key for truncate operations is [T]. Column type mapping has been configured to map source column types to an appropriate corresponding Avro type. Created Avro schemas will be output to the directory [./dirdef]. Created Avro schemas will be encoded using the [UTF-8] character set. In the event of a primary key update, the Avro Formatter will ABEND. Avro row messages will not be wrapped inside a generic Avro message. No delimiter will be inserted after each generated Avro message. **** End Avro Row Formatter - Configuration Summary **** 2015-09-21 10:05:11 INFO HDFSHandler:207 - **** Begin HDFS Handler - Configuration Summary **** Mode of operation is set to tx. Data streamed to HDFS will be partitioned by table. Tokens will be included in the output. The HDFS root directory for writing is set to [/ogg]. The maximum HDFS file size has been set to 1073741824 bytes. Rolling of HDFS files based on time is configured as off. Rolling of HDFS files based on write inactivity is configured as off. Rolling of HDFS files in the case of a metadata change event is enabled. HDFS partitioning information: The HDFS partitioning object contains no partitioning information. HDFS Handler Authentication type has been configured to use [none] **** End HDFS Handler - Configuration Summary ****
Parent topic: Troubleshooting the HDFS Handler
9.2.9.11 HDFS Handler Client Dependencies
This appendix lists the HDFS client dependencies for Apache Hadoop. The hadoop-client-x.x.x.jar
is not distributed with Apache Hadoop nor is it mandatory to be in the classpath. The hadoop-client-x.x.x.jar
is an empty maven project with the purpose of aggregating all of the Hadoop client dependencies.
Maven groupId: org.apache.hadoop
Maven atifactId: hadoop-client
Maven version: the HDFS version numbers listed for each section
Parent topic: Apache HDFS
9.2.9.11.1 Hadoop Client Dependencies
This section lists the Hadoop client dependencies for each HDFS version.
- HDFS 3.3.0
- HDFS 3.2.0
- HDFS 3.1.4
- HDFS 3.0.3
- HDFS 2.9.2
- HDFS 2.8.5
- HDFS 2.7.7
- HDFS 2.6.0
- HDFS 2.5.2
- HDFS 2.4.1
- HDFS 2.3.0
- HDFS 2.2.0
Parent topic: HDFS Handler Client Dependencies
9.2.9.11.1.1 HDFS 3.3.0
accessors-smart-1.2.jar animal-sniffer-annotations-1.17.jar asm-5.0.4.jar avro-1.7.7.jar azure-keyvault-core-1.0.0.jar azure-storage-7.0.0.jar checker-qual-2.5.2.jar commons-beanutils-1.9.4.jar commons-cli-1.2.jar commons-codec-1.11.jar commons-collections-3.2.2.jar commons-compress-1.19.jar commons-configuration2-2.1.1.jar commons-io-2.5.jar commons-lang3-3.7.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.6.jar commons-text-1.4.jar curator-client-4.2.0.jar curator-framework-4.2.0.jar curator-recipes-4.2.0.jar dnsjava-2.1.7.jar failureaccess-1.0.jar gson-2.2.4.jar guava-27.0-jre.jar hadoop-annotations-3.3.0.jar hadoop-auth-3.3.0.jar hadoop-azure-3.3.0.jar hadoop-client-3.3.0.jar hadoop-common-3.3.0.jar hadoop-hdfs-client-3.3.0.jar hadoop-mapreduce-client-common-3.3.0.jar hadoop-mapreduce-client-core-3.3.0.jar hadoop-mapreduce-client-jobclient-3.3.0.jar hadoop-shaded-protobuf_3_7-1.0.0.jar hadoop-yarn-api-3.3.0.jar hadoop-yarn-client-3.3.0.jar hadoop-yarn-common-3.3.0.jar htrace-core4-4.1.0-incubating.jar httpclient-4.5.6.jar httpcore-4.4.10.jar j2objc-annotations-1.1.jar jackson-annotations-2.10.3.jar jackson-core-2.6.0.jar jackson-core-asl-1.9.13.jar jackson-databind-2.10.3.jar jackson-jaxrs-base-2.10.3.jar jackson-jaxrs-json-provider-2.10.3.jar jackson-mapper-asl-1.9.13.jar jackson-module-jaxb-annotations-2.10.3.jar jakarta.activation-api-1.2.1.jar jakarta.xml.bind-api-2.3.2.jar javax.activation-api-1.2.0.jar javax.servlet-api-3.1.0.jar jaxb-api-2.2.11.jar jcip-annotations-1.0-1.jar jersey-client-1.19.jar jersey-core-1.19.jar jersey-servlet-1.19.jar jetty-client-9.4.20.v20190813.jar jetty-http-9.4.20.v20190813.jar jetty-io-9.4.20.v20190813.jar jetty-security-9.4.20.v20190813.jar jetty-servlet-9.4.20.v20190813.jar jetty-util-9.4.20.v20190813.jar jetty-util-ajax-9.4.20.v20190813.jar jetty-webapp-9.4.20.v20190813.jar jetty-xml-9.4.20.v20190813.jar jline-3.9.0.jar json-smart-2.3.jar jsp-api-2.1.jar jsr305-3.0.2.jar jsr311-api-1.1.1.jar kerb-admin-1.0.1.jar kerb-client-1.0.1.jar kerb-common-1.0.1.jar kerb-core-1.0.1.jar kerb-crypto-1.0.1.jar kerb-identity-1.0.1.jar kerb-server-1.0.1.jar kerb-simplekdc-1.0.1.jar kerb-util-1.0.1.jar kerby-asn1-1.0.1.jar kerby-config-1.0.1.jar kerby-pkix-1.0.1.jar kerby-util-1.0.1.jar kerby-xdr-1.0.1.jar listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar log4j-1.2.17.jar nimbus-jose-jwt-7.9.jar okhttp-2.7.5.jar okio-1.6.0.jar paranamer-2.3.jar protobuf-java-2.5.0.jar re2j-1.1.jar slf4j-api-1.7.25.jar snappy-java-1.0.5.jar stax2-api-3.1.4.jar token-provider-1.0.1.jar websocket-api-9.4.20.v20190813.jar websocket-client-9.4.20.v20190813.jar websocket-common-9.4.20.v20190813.jar wildfly-openssl-1.0.7.Final.jar woodstox-core-5.0.3.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.2 HDFS 3.2.0
accessors-smart-1.2.jar asm-5.0.4.jar avro-1.7.7.jar azure-keyvault-core-1.0.0.jar azure-storage-7.0.0.jar commons-beanutils-1.9.3.jar commons-cli-1.2.jar commons-codec-1.11.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration2-2.1.1.jar commons-io-2.5.jar commons-lang3-3.7.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.6.jar commons-text-1.4.jar curator-client-2.12.0.jar curator-framework-2.12.0.jar curator-recipes-2.12.0.jar dnsjava-2.1.7.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-3.2.0.jar hadoop-auth-3.2.0.jar hadoop-azure-3.2.0.jar hadoop-client-3.2.0.jar hadoop-common-3.2.0.jar hadoop-hdfs-client-3.2.0.jar hadoop-mapreduce-client-common-3.2.0.jar hadoop-mapreduce-client-core-3.2.0.jar hadoop-mapreduce-client-jobclient-3.2.0.jar hadoop-yarn-api-3.2.0.jar hadoop-yarn-client-3.2.0.jar hadoop-yarn-common-3.2.0.jar htrace-core4-4.1.0-incubating.jar httpclient-4.5.2.jar httpcore-4.4.4.jar jackson-annotations-2.9.5.jar jackson-core-2.6.0.jar jackson-core-asl-1.9.13.jar jackson-databind-2.9.5.jar jackson-jaxrs-base-2.9.5.jar jackson-jaxrs-json-provider-2.9.5.jar jackson-mapper-asl-1.9.13.jar jackson-module-jaxb-annotations-2.9.5.jar javax.servlet-api-3.1.0.jar jaxb-api-2.2.11.jar jcip-annotations-1.0-1.jar jersey-client-1.19.jar jersey-core-1.19.jar jersey-servlet-1.19.jar jetty-security-9.3.24.v20180605.jar jetty-servlet-9.3.24.v20180605.jar jetty-util-9.3.24.v20180605.jar jetty-util-ajax-9.3.24.v20180605.jar jetty-webapp-9.3.24.v20180605.jar jetty-xml-9.3.24.v20180605.jar json-smart-2.3.jar jsp-api-2.1.jar jsr305-3.0.0.jar jsr311-api-1.1.1.jar kerb-admin-1.0.1.jar kerb-client-1.0.1.jar kerb-common-1.0.1.jar kerb-core-1.0.1.jar kerb-crypto-1.0.1.jar kerb-identity-1.0.1.jar kerb-server-1.0.1.jar kerb-simplekdc-1.0.1.jar kerb-util-1.0.1.jar kerby-asn1-1.0.1.jar kerby-config-1.0.1.jar kerby-pkix-1.0.1.jar kerby-util-1.0.1.jar kerby-xdr-1.0.1.jar log4j-1.2.17.jar nimbus-jose-jwt-4.41.1.jar okhttp-2.7.5.jar okio-1.6.0.jar paranamer-2.3.jar protobuf-java-2.5.0.jar re2j-1.1.jar slf4j-api-1.7.25.jar snappy-java-1.0.5.jar stax2-api-3.1.4.jar token-provider-1.0.1.jar wildfly-openssl-1.0.4.Final.jar woodstox-core-5.0.3.jar xz-1.0.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.3 HDFS 3.1.4
accessors-smart-1.2.jar animal-sniffer-annotations-1.17.jar asm-5.0.4.jar avro-1.7.7.jar azure-keyvault-core-1.0.0.jar azure-storage-7.0.0.jar checker-qual-2.5.2.jar commons-beanutils-1.9.4.jar commons-cli-1.2.jar commons-codec-1.11.jar commons-collections-3.2.2.jar commons-compress-1.19.jar commons-configuration2-2.1.1.jar commons-io-2.5.jar commons-lang-2.6.jar commons-lang3-3.4.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.6.jar curator-client-2.13.0.jar curator-framework-2.13.0.jar curator-recipes-2.13.0.jar error_prone_annotations-2.2.0.jar failureaccess-1.0.jar gson-2.2.4.jar guava-27.0-jre.jar hadoop-annotations-3.1.4.jar hadoop-auth-3.1.4.jar hadoop-azure-3.1.4.jar hadoop-client-3.1.4.jar hadoop-common-3.1.4.jar hadoop-hdfs-client-3.1.4.jar hadoop-mapreduce-client-common-3.1.4.jar hadoop-mapreduce-client-core-3.1.4.jar hadoop-mapreduce-client-jobclient-3.1.4.jar hadoop-yarn-api-3.1.4.jar hadoop-yarn-client-3.1.4.jar hadoop-yarn-common-3.1.4.jar htrace-core4-4.1.0-incubating.jar httpclient-4.5.2.jar httpcore-4.4.4.jar j2objc-annotations-1.1.jar jackson-annotations-2.9.10.jar jackson-core-2.9.10.jar jackson-core-asl-1.9.13.jar jackson-databind-2.9.10.4.jar jackson-jaxrs-base-2.9.10.jar jackson-jaxrs-json-provider-2.9.10.jar jackson-mapper-asl-1.9.13.jar jackson-module-jaxb-annotations-2.9.10.jar javax.servlet-api-3.1.0.jar jaxb-api-2.2.11.jar jcip-annotations-1.0-1.jar jersey-client-1.19.jar jersey-core-1.19.jar jersey-servlet-1.19.jar jetty-security-9.4.20.v20190813.jar jetty-servlet-9.4.20.v20190813.jar jetty-util-9.4.20.v20190813.jar jetty-util-ajax-9.4.20.v20190813.jar jetty-webapp-9.4.20.v20190813.jar jetty-xml-9.4.20.v20190813.jar json-smart-2.3.jar jsp-api-2.1.jar jsr305-3.0.2.jar jsr311-api-1.1.1.jar kerb-admin-1.0.1.jar kerb-client-1.0.1.jar kerb-common-1.0.1.jar kerb-core-1.0.1.jar kerb-crypto-1.0.1.jar kerb-identity-1.0.1.jar kerb-server-1.0.1.jar kerb-simplekdc-1.0.1.jar kerb-util-1.0.1.jar kerby-asn1-1.0.1.jar kerby-config-1.0.1.jar kerby-pkix-1.0.1.jar kerby-util-1.0.1.jar kerby-xdr-1.0.1.jar listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar log4j-1.2.17.jar nimbus-jose-jwt-7.9.jar okhttp-2.7.5.jar okio-1.6.0.jar paranamer-2.3.jar protobuf-java-2.5.0.jar re2j-1.1.jar slf4j-api-1.7.25.jar snappy-java-1.0.5.jar stax2-api-3.1.4.jar token-provider-1.0.1.jar woodstox-core-5.0.3.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.4 HDFS 3.0.3
accessors-smart-1.2.jar asm-5.0.4.jar avro-1.7.7.jar azure-keyvault-core-0.8.0.jar azure-storage-5.4.0.jar commons-beanutils-1.9.3.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration2-2.1.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-lang3-3.4.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.6.jar curator-client-2.12.0.jar curator-framework-2.12.0.jar curator-recipes-2.12.0.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-3.0.3.jar hadoop-auth-3.0.3.jar hadoop-azure-3.0.3.jar hadoop-client-3.0.3.jar hadoop-common-3.0.3.jar hadoop-hdfs-client-3.0.3.jar hadoop-mapreduce-client-common-3.0.3.jar hadoop-mapreduce-client-core-3.0.3.jar hadoop-mapreduce-client-jobclient-3.0.3.jar hadoop-yarn-api-3.0.3.jar hadoop-yarn-client-3.0.3.jar hadoop-yarn-common-3.0.3.jar htrace-core4-4.1.0-incubating.jar httpclient-4.5.2.jar httpcore-4.4.4.jar jackson-annotations-2.7.8.jar jackson-core-2.7.8.jar jackson-core-asl-1.9.13.jar jackson-databind-2.7.8.jar jackson-jaxrs-base-2.7.8.jar jackson-jaxrs-json-provider-2.7.8.jar jackson-mapper-asl-1.9.13.jar jackson-module-jaxb-annotations-2.7.8.jar javax.servlet-api-3.1.0.jar jaxb-api-2.2.11.jar jcip-annotations-1.0-1.jar jersey-client-1.19.jar jersey-core-1.19.jar jersey-servlet-1.19.jar jetty-security-9.3.19.v20170502.jar jetty-servlet-9.3.19.v20170502.jar jetty-util-9.3.19.v20170502.jar jetty-util-ajax-9.3.19.v20170502.jar jetty-webapp-9.3.19.v20170502.jar jetty-xml-9.3.19.v20170502.jar json-smart-2.3.jar jsp-api-2.1.jar jsr305-3.0.0.jar jsr311-api-1.1.1.jar kerb-admin-1.0.1.jar kerb-client-1.0.1.jar kerb-common-1.0.1.jar kerb-core-1.0.1.jar kerb-crypto-1.0.1.jar kerb-identity-1.0.1.jar kerb-server-1.0.1.jar kerb-simplekdc-1.0.1.jar kerb-util-1.0.1.jar kerby-asn1-1.0.1.jar kerby-config-1.0.1.jar kerby-pkix-1.0.1.jar kerby-util-1.0.1.jar kerby-xdr-1.0.1.jar log4j-1.2.17.jar nimbus-jose-jwt-4.41.1.jar okhttp-2.7.5.jar okio-1.6.0.jar paranamer-2.3.jar protobuf-java-2.5.0.jar re2j-1.1.jar slf4j-api-1.7.25.jar snappy-java-1.0.5.jar stax2-api-3.1.4.jar token-provider-1.0.1.jar woodstox-core-5.0.3.jar xz-1.0.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.5 HDFS 2.9.2
accessors-smart-1.2.jar activation-1.1.jar apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar asm-5.0.4.jar avro-1.7.7.jar azure-keyvault-core-0.8.0.jar azure-storage-5.4.0.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-io-2.4.jar commons-lang-2.6.jar commons-lang3-3.4.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar curator-client-2.7.1.jar curator-framework-2.7.1.jar curator-recipes-2.7.1.jar ehcache-3.3.1.jar geronimo-jcache_1.0_spec-1.0-alpha-1.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-2.9.2.jar hadoop-auth-2.9.2.jar hadoop-azure-2.9.2.jar hadoop-client-2.9.2.jar hadoop-common-2.9.2.jar hadoop-hdfs-client-2.9.2.jar hadoop-mapreduce-client-app-2.9.2.jar hadoop-mapreduce-client-common-2.9.2.jar hadoop-mapreduce-client-core-2.9.2.jar hadoop-mapreduce-client-jobclient-2.9.2.jar hadoop-mapreduce-client-shuffle-2.9.2.jar hadoop-yarn-api-2.9.2.jar hadoop-yarn-client-2.9.2.jar hadoop-yarn-common-2.9.2.jar hadoop-yarn-registry-2.9.2.jar hadoop-yarn-server-common-2.9.2.jar HikariCP-java7-2.4.12.jar htrace-core4-4.1.0-incubating.jar httpclient-4.5.2.jar httpcore-4.4.4.jar jackson-annotations-2.4.0.jar jackson-core-2.7.8.jar jackson-core-asl-1.9.13.jar jackson-databind-2.4.0.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar jaxb-api-2.2.2.jar jcip-annotations-1.0-1.jar jersey-client-1.9.jar jersey-core-1.9.jar jetty-sslengine-6.1.26.jar jetty-util-6.1.26.jar json-smart-2.3.jar jsp-api-2.1.jar jsr305-3.0.0.jar leveldbjni-all-1.8.jar log4j-1.2.17.jar mssql-jdbc-6.2.1.jre7.jar netty-3.7.0.Final.jar nimbus-jose-jwt-4.41.1.jar okhttp-2.7.5.jar okio-1.6.0.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.25.jar slf4j-log4j12-1.7.25.jar snappy-java-1.0.5.jar stax2-api-3.1.4.jar stax-api-1.0-2.jar woodstox-core-5.0.3.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.6 HDFS 2.8.5
accessors-smart-1.2.jar activation-1.1.jar apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar asm-5.0.4.jar avro-1.7.4.jar azure-storage-2.2.0.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-io-2.4.jar commons-lang-2.6.jar commons-lang3-3.3.2.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar curator-client-2.7.1.jar curator-framework-2.7.1.jar curator-recipes-2.7.1.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-2.8.5.jar hadoop-auth-2.8.5.jar hadoop-azure-2.8.5.jar hadoop-client-2.8.5.jar hadoop-common-2.8.5.jar hadoop-hdfs-client-2.8.5.jar hadoop-mapreduce-client-app-2.8.5.jar hadoop-mapreduce-client-common-2.8.5.jar hadoop-mapreduce-client-core-2.8.5.jar hadoop-mapreduce-client-jobclient-2.8.5.jar hadoop-mapreduce-client-shuffle-2.8.5.jar hadoop-yarn-api-2.8.5.jar hadoop-yarn-client-2.8.5.jar hadoop-yarn-common-2.8.5.jar hadoop-yarn-server-common-2.8.5.jar htrace-core4-4.0.1-incubating.jar httpclient-4.5.2.jar httpcore-4.4.4.jar jackson-core-2.2.3.jar jackson-core-asl-1.9.13.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar jaxb-api-2.2.2.jar jcip-annotations-1.0-1.jar jersey-client-1.9.jar jersey-core-1.9.jar jetty-sslengine-6.1.26.jar jetty-util-6.1.26.jar json-smart-2.3.jar jsp-api-2.1.jar jsr305-3.0.0.jar leveldbjni-all-1.8.jar log4j-1.2.17.jar netty-3.7.0.Final.jar nimbus-jose-jwt-4.41.1.jar okhttp-2.4.0.jar okio-1.4.0.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.10.jar slf4j-log4j12-1.7.10.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.7 HDFS 2.7.7
HDFS 2.7.7 (HDFS 2.7.0 is effectively the same, simply substitute 2.7.0 on the libraries versioned as 2.7.7)
activation-1.1.jar apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar avro-1.7.4.jar azure-storage-2.0.0.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-lang3-3.3.2.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar curator-client-2.7.1.jar curator-framework-2.7.1.jar curator-recipes-2.7.1.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-2.7.7.jar hadoop-auth-2.7.7.jar hadoop-azure-2.7.7.jar hadoop-client-2.7.7.jar hadoop-common-2.7.7.jar hadoop-hdfs-2.7.7.jar hadoop-mapreduce-client-app-2.7.7.jar hadoop-mapreduce-client-common-2.7.7.jar hadoop-mapreduce-client-core-2.7.7.jar hadoop-mapreduce-client-jobclient-2.7.7.jar hadoop-mapreduce-client-shuffle-2.7.7.jar hadoop-yarn-api-2.7.7.jar hadoop-yarn-client-2.7.7.jar hadoop-yarn-common-2.7.7.jar hadoop-yarn-server-common-2.7.7.jar htrace-core-3.1.0-incubating.jar httpclient-4.2.5.jar httpcore-4.2.4.jar jackson-core-2.2.3.jar jackson-core-asl-1.9.13.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar jaxb-api-2.2.2.jar jersey-client-1.9.jar jersey-core-1.9.jar jetty-sslengine-6.1.26.jar jetty-util-6.1.26.jar jsp-api-2.1.jar jsr305-3.0.0.jar leveldbjni-all-1.8.jar log4j-1.2.17.jar netty-3.6.2.Final.jar netty-all-4.0.23.Final.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.10.jar slf4j-log4j12-1.7.10.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xercesImpl-2.9.1.jar xml-apis-1.3.04.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.8 HDFS 2.6.0
activation-1.1.jar apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar avro-1.7.4.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar curator-client-2.6.0.jar curator-framework-2.6.0.jar curator-recipes-2.6.0.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-2.6.0.jar hadoop-auth-2.6.0.jar hadoop-client-2.6.0.jar hadoop-common-2.6.0.jar hadoop-hdfs-2.6.0.jar hadoop-mapreduce-client-app-2.6.0.jar hadoop-mapreduce-client-common-2.6.0.jar hadoop-mapreduce-client-core-2.6.0.jar hadoop-mapreduce-client-jobclient-2.6.0.jar hadoop-mapreduce-client-shuffle-2.6.0.jar hadoop-yarn-api-2.6.0.jar hadoop-yarn-client-2.6.0.jar hadoop-yarn-common-2.6.0.jar hadoop-yarn-server-common-2.6.0.jar htrace-core-3.0.4.jar httpclient-4.2.5.jar httpcore-4.2.4.jar jackson-core-asl-1.9.13.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar jaxb-api-2.2.2.jar jersey-client-1.9.jar jersey-core-1.9.jar jetty-util-6.1.26.jar jsr305-1.3.9.jar leveldbjni-all-1.8.jar log4j-1.2.17.jar netty-3.6.2.Final.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xercesImpl-2.9.1.jar xml-apis-1.3.04.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.9 HDFS 2.5.2
HDFS 2.5.2 (HDFS 2.5.1 and 2.5.0 are effectively the same, simply substitute 2.5.1 or 2.5.0 on the libraries versioned as 2.5.2)
activation-1.1.jar apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar avro-1.7.4.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar guava-11.0.2.jar hadoop-annotations-2.5.2.jar adoop-auth-2.5.2.jar hadoop-client-2.5.2.jar hadoop-common-2.5.2.jar hadoop-hdfs-2.5.2.jar hadoop-mapreduce-client-app-2.5.2.jar hadoop-mapreduce-client-common-2.5.2.jar hadoop-mapreduce-client-core-2.5.2.jar hadoop-mapreduce-client-jobclient-2.5.2.jar hadoop-mapreduce-client-shuffle-2.5.2.jar hadoop-yarn-api-2.5.2.jar hadoop-yarn-client-2.5.2.jar hadoop-yarn-common-2.5.2.jar hadoop-yarn-server-common-2.5.2.jar httpclient-4.2.5.jar httpcore-4.2.4.jar jackson-core-asl-1.9.13.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar jaxb-api-2.2.2.jar jersey-client-1.9.jar jersey-core-1.9.jar jetty-util-6.1.26.jar jsr305-1.3.9.jar leveldbjni-all-1.8.jar log4j-1.2.17.jar netty-3.6.2.Final.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.10 HDFS 2.4.1
HDFS 2.4.1 (HDFS 2.4.0 is effectively the same, simply substitute 2.4.0 on the libraries versioned as 2.4.1)
activation-1.1.jar avro-1.7.4.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar guava-11.0.2.jar hadoop-annotations-2.4.1.jar hadoop-auth-2.4.1.jar hadoop-client-2.4.1.jar hadoop-hdfs-2.4.1.jar hadoop-mapreduce-client-app-2.4.1.jar hadoop-mapreduce-client-common-2.4.1.jar hadoop-mapreduce-client-core-2.4.1.jar hadoop-mapreduce-client-jobclient-2.4.1.jar hadoop-mapreduce-client-shuffle-2.4.1.jar hadoop-yarn-api-2.4.1.jar hadoop-yarn-client-2.4.1.jar hadoop-yarn-common-2.4.1.jar hadoop-yarn-server-common-2.4.1.jar httpclient-4.2.5.jar httpcore-4.2.4.jar jackson-core-asl-1.8.8.jar jackson-mapper-asl-1.8.8.jar jaxb-api-2.2.2.jar jersey-client-1.9.jar jersey-core-1.9.jar jetty-util-6.1.26.jar jsr305-1.3.9.jar log4j-1.2.17.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.5.jar hadoop-common-2.4.1.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.11 HDFS 2.3.0
activation-1.1.jar avro-1.7.4.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar guava-11.0.2.jar hadoop-annotations-2.3.0.jar hadoop-auth-2.3.0.jar hadoop-client-2.3.0.jar hadoop-common-2.3.0.jar hadoop-hdfs-2.3.0.jar hadoop-mapreduce-client-app-2.3.0.jar hadoop-mapreduce-client-common-2.3.0.jar hadoop-mapreduce-client-core-2.3.0.jar hadoop-mapreduce-client-jobclient-2.3.0.jar hadoop-mapreduce-client-shuffle-2.3.0.jar hadoop-yarn-api-2.3.0.jar hadoop-yarn-client-2.3.0.jar hadoop-yarn-common-2.3.0.jar hadoop-yarn-server-common-2.3.0.jar httpclient-4.2.5.jar httpcore-4.2.4.jar jackson-core-asl-1.8.8.jar jackson-mapper-asl-1.8.8.jar jaxb-api-2.2.2.jar jersey-core-1.9.jar jetty-util-6.1.26.jar jsr305-1.3.9.jar log4j-1.2.17.jar paranamer-2.3.jar protobuf-java-2.5.0.jar servlet-api-2.5.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.5.jar
Parent topic: Hadoop Client Dependencies
9.2.9.11.1.12 HDFS 2.2.0
activation-1.1.jar aopalliance-1.0.jar asm-3.1.jar avro-1.7.4.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.1.jar commons-lang-2.5.jar commons-logging-1.1.1.jar commons-math-2.1.jar commons-net-3.1.jar gmbal-api-only-3.0.0-b023.jar grizzly-framework-2.1.2.jar grizzly-http-2.1.2.jar grizzly-http-server-2.1.2.jar grizzly-http-servlet-2.1.2.jar grizzly-rcm-2.1.2.jar guava-11.0.2.jar guice-3.0.jar hadoop-annotations-2.2.0.jar hadoop-auth-2.2.0.jar hadoop-client-2.2.0.jar hadoop-common-2.2.0.jar hadoop-hdfs-2.2.0.jar hadoop-mapreduce-client-app-2.2.0.jar hadoop-mapreduce-client-common-2.2.0.jar hadoop-mapreduce-client-core-2.2.0.jar hadoop-mapreduce-client-jobclient-2.2.0.jar hadoop-mapreduce-client-shuffle-2.2.0.jar hadoop-yarn-api-2.2.0.jar hadoop-yarn-client-2.2.0.jar hadoop-yarn-common-2.2.0.jar hadoop-yarn-server-common-2.2.0.jar jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.3.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.3.jar javax.inject-1.jar javax.servlet-3.1.jar javax.servlet-api-3.0.1.jar jaxb-api-2.2.2.jar jaxb-impl-2.2.3-1.jar jersey-client-1.9.jar jersey-core-1.9.jar jersey-grizzly2-1.9.jar jersey-guice-1.9.jar jersey-json-1.9.jar jersey-server-1.9.jar jersey-test-framework-core-1.9.jar jersey-test-framework-grizzly2-1.9.jar jettison-1.1.jar jetty-util-6.1.26.jar jsr305-1.3.9.jar log4j-1.2.17.jar management-api-3.0.0-b012.jar paranamer-2.3.jar protobuf-java-2.5.0.jar slf4j-api-1.7.5.jar slf4j-log4j12-1.7.5.jar snappy-java-1.0.4.1.jar stax-api-1.0.1.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.5.jar
Parent topic: Hadoop Client Dependencies