Rebalancing a Kafka Cluster
Rebalance Big Data Service Kafka clusters to define the number of copies of the topic across the cluster.
In a Kafka cluster, brokers ensure high availability to process new events. Kafka, being fault-tolerant, replicas of the messages are maintained on each broker and are made available in case of failures. With the help of the replication factor, you can define the number of copies of the topic across the cluster.
Add new brokers or disks to an existing Kafka broker by assigning a unique broker ID, listeners, and a log directory from Ambari configurations for Kafka. However, these brokers/disks aren't assigned any data partitions of the existing topics in the cluster. Unless you move the partitions or create new topics, brokers won’t be doing much work. To overcome this problem, the kafka-reassign-partitions
tool can be used.
Creating the Topics-to-Move JSON File
Create a topics-to-move
JSON file to specify the topics to be reassigned.
topics-to-move
tells the kafka-reassign-partitions
tool which partitions to look at when generating a proposal for the reassignment configuration. You must create the topics-to-move
JSON file from scratch. The format of the file is the following:
{"topics": [{"topic": "topic1"}, {"topic": "topic2"}], "version":1 }
For more information on creating the topics-to-move
JSON file, see Running the Reassign Partitions with the kafka-reassign-partitions-tool.
Reassignment Configuration JSON
This JSON file is a configuration file that contains the parameters used in the reassignment process. You create this file, however, a proposal for its contents is generated by the tool. When the kafka-reasssign-partitions
tool is executed with the --generate
option, it generates a proposed configuration that can be fine-tuned and saved as a JSON file. Creating the file this way is the reassignment configuration JSON. To generate a proposal, the tool requires a topics-to-move
file as input. The format of the file is the following:
{"version":1,
"partitions":
[{"topic":"topic1","partition":1001,"replicas":[1001,1002],"log_dirs":["any","any"]},
{"topic":"topic1","partition":1002,"replicas":[1002,1001],"log_dirs":["any","any"]},
{"topic":"topic2","partition":1003,"replicas":[1002,1001],"log_dirs":["any","any"]}]
}
For more information on creating the topics-to-move
JSON file, see Running the Reassign Partitions with the kafka-reassign-partitions-tool.
Reassignment Configuration Properties
The reassignment configuration contains multiple properties.
Properties | Description |
---|---|
topic |
Specifies the topic. |
partition |
Specifies the partition. |
replicas |
Specifies the brokers that the selected partition is assigned to. The brokers are listed in order, which means that the first broker in the list is always the leader for that partition. Change the order of brokers to resolve any leader-balancing issues among brokers. Change the broker IDs to reassign partitions to different brokers. |
log_dirs |
Specifies the log directory of the brokers. The log directories are listed in the same order as the brokers. By default any is specified as the log directory, which means that the broker is free to choose where it places the replica. By default, the current broker implementation selects the log directory using a round-robin algorithm. An absolute path beginning with a / can be used to explicitly set where to store the partition replica. |
Running the Reassign Partitions with the kafka-reassign-partitions-tool
- For a Kafka cluster with large data, use this tool carefully. To move many partitions, we recommend you to run the tool in batches of three or four partitions at a time.
- Ensure the Brokers are healthy before running this tool.
- This tool can't be used to make an out-of-sync replica into the leader partition.
- Redistribute the load when the system is at 70% capacity.