Rebalancing a Kafka Cluster

Rebalance Big Data Service Kafka clusters to define the number of copies of the topic across the cluster.

In a Kafka cluster, brokers ensure high availability to process new events. Kafka, being fault-tolerant, replicas of the messages are maintained on each broker and are made available in case of failures. With the help of the replication factor, you can define the number of copies of the topic across the cluster.

Add new brokers or disks to an existing Kafka broker by assigning a unique broker ID, listeners, and a log directory from Ambari configurations for Kafka. However, these brokers/disks aren't assigned any data partitions of the existing topics in the cluster. Unless you move the partitions or create new topics, brokers won’t be doing much work. To overcome this problem, the kafka-reassign-partitions tool can be used.

Creating the Topics-to-Move JSON File

Create a topics-to-move JSON file to specify the topics to be reassigned.

topics-to-move tells the kafka-reassign-partitions tool which partitions to look at when generating a proposal for the reassignment configuration. You must create the topics-to-move JSON file from scratch. The format of the file is the following:

{"topics": [{"topic": "topic1"}, {"topic": "topic2"}], "version":1 }

For more information on creating the topics-to-move JSON file, see Running the Reassign Partitions with the kafka-reassign-partitions-tool.

Reassignment Configuration JSON

This JSON file is a configuration file that contains the parameters used in the reassignment process. You create this file, however, a proposal for its contents is generated by the tool. When the kafka-reasssign-partitions tool is executed with the --generate option, it generates a proposed configuration that can be fine-tuned and saved as a JSON file. Creating the file this way is the reassignment configuration JSON. To generate a proposal, the tool requires a topics-to-move file as input. The format of the file is the following:

{"version":1,
 "partitions":
   [{"topic":"topic1","partition":1001,"replicas":[1001,1002],"log_dirs":["any","any"]},
    {"topic":"topic1","partition":1002,"replicas":[1002,1001],"log_dirs":["any","any"]},
    {"topic":"topic2","partition":1003,"replicas":[1002,1001],"log_dirs":["any","any"]}]
}

For more information on creating the topics-to-move JSON file, see Running the Reassign Partitions with the kafka-reassign-partitions-tool.

Reassignment Configuration Properties

The reassignment configuration contains multiple properties.

Properties Description
topic Specifies the topic.
partition Specifies the partition.
replicas Specifies the brokers that the selected partition is assigned to. The brokers are listed in order, which means that the first broker in the list is always the leader for that partition. Change the order of brokers to resolve any leader-balancing issues among brokers. Change the broker IDs to reassign partitions to different brokers.
log_dirs Specifies the log directory of the brokers. The log directories are listed in the same order as the brokers. By default any is specified as the log directory, which means that the broker is free to choose where it places the replica. By default, the current broker implementation selects the log directory using a round-robin algorithm. An absolute path beginning with a / can be used to explicitly set where to store the partition replica.

Running the Reassign Partitions with the kafka-reassign-partitions-tool

  • For a Kafka cluster with large data, use this tool carefully. To move many partitions, we recommend you to run the tool in batches of three or four partitions at a time.
  • Ensure the Brokers are healthy before running this tool.
  • This tool can't be used to make an out-of-sync replica into the leader partition.
  • Redistribute the load when the system is at 70% capacity.
  1. SSH to one of the broker nodes in Big Data Service cluster. The kafka-reassign-partitions-tool is located in /usr/odh/current/kafka-broker/bin.
  2. Create a topics-to-move JSON file that specifies the topics you want to reassign. Use the following format:
    {"topics":  [{"topic": "topic1"},
                 {"topic": "topic2"}],
     "version":1
    }
  3. Generate the content for the reassignment configuration JSON with the following command:
    kafka-reassign-partitions --topics-to-move-json-file <path to topics to move.json> --bootstrap-server <bootstrap servers> --generate

    The output displays the distribution of partition replicas on your current brokers followed by a proposed partition reassignment configuration.

    Current partition replica assignment
    {"version":1,
     "partitions":
       [{"topic":"topic2","partition":1,"replicas":[1002,1003],"log_dirs":["any","any"]},
        {"topic":"topic1","partition":0,"replicas":[1001,1002],"log_dirs":["any","any"]},
        {"topic":"topic2","partition":0,"replicas":[1001,1002],"log_dirs":["any","any"]},
        {"topic":"topic1","partition":2,"replicas":[1003,1001],"log_dirs":["any","any"]},
        {"topic":"topic1","partition":1,"replicas":[1002,1003],"log_dirs":["any","any"]}]
    }
     
    Proposed partition reassignment configuration
     
    {"version":1,
     "partitions":
       [{"topic":"mytopic1","partition":0,"replicas":[1004,1005],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":2,"replicas":[1004,1005],"log_dirs":["any","any"]},
        {"topic":"mytopic2","partition":1,"replicas":[1004,1005],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":1,"replicas":[1005,1004],"log_dirs":["any","any"]},
        {"topic":"mytopic2","partition":0,"replicas":[1005,1004],"log_dirs":["any","any"]}]
    }

    In this example, the tool proposed a configuration that reassigns existing partitions on brokers 1, 2, and 3 to brokers 4 and 5.

  4. Copy and paste the proposed partition reassignment configuration into an empty JSON file.
  5. Review, and if required, modify the suggested reassignment configuration. Save the file.
  6. Start the redistribution process with the following command:
    kafka-reassign-partitions --reassignment-json-file <path to reassignment configuration.json> --bootstrap-server <bootstrap servers> --execute
  7. To verify the partition movement, run
    kafka-reassign-partitions --reassignment-json-file <path to reassignment configuration.json> --bootstrap-server <bootstrap servers> --verify

    The tool prints the reassignment status of all partitions.

    Status of partition reassignment: 
    Reassignment of partition topic2-1 completed successfully 
    Reassignment of partition topic1-0 completed successfully 
    Reassignment of partition topic2-0 completed successfully 
    Reassignment of partition topic1-2 completed successfully 
    Reassignment of partition topic1-1 completed successfully