Kafka Consumer Groups Explained
In a production environment, a single consumer is often not enough to handle the volume of data flowing through a Kafka topic. Consumer Groups are the primary mechanism for scaling consumption horizontally.
1. The Group ID
A Consumer Group is identified by a unique group.id. When multiple consumers share the same group.id, they work together to consume the messages from a topic.
2. Partition Assignment
Kafka ensures that each partition is consumed by only one consumer within a group at any given time. This prevents duplicate processing of the same message.
- If you have 4 partitions and 2 consumers in a group, each consumer gets 2 partitions.
- If you add 2 more consumers (total 4), each consumer gets 1 partition.
- If you add 1 more consumer (total 5), the 5th consumer sits idle because there are no more partitions to assign.
3. The Rebalance Process
When a consumer joins or leaves a group, Kafka triggers a Rebalance. This is the process of redistributing partition ownership among the remaining members.
![Kafka Rebalance Diagram Placeholder]
Triggers for Rebalance:
- A new consumer joins the group.
- An existing consumer crashes or leaves.
- A consumer's heartbeat fails to reach the broker within the
session.timeout.ms.
4. Offsets and Committing
Kafka tracks the progress of a consumer group by storing an Offset for each partition. This is the "index" of the last message processed.
// Java configuration for a consumer group
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "order-processing-group");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
Summary
Consumer Groups allow you to scale your processing power simply by adding more instances of your service. By understanding partition assignment and rebalancing, you can design robust, high-throughput streaming applications.
Next: Mastering Kafka Rebalancing Playbook Previous: What is Load Balancing?
