Introducing Connector Private Networking: Join The Upcoming Webinar!

Sep 29, 2023Read Time: 4 min

An Introduction to Apache Kafka Consumer Group Strategy

The Confluent Developer Newsletter

Get Apache Kafka and Flink news delivered to your inbox biweekly or read the latest editions on Confluent Developer!

Read the Latest

Course: Apache Kafka 101

Learn the basics of Apache Kafka in our Confluent Developer course.

Start the Course

Written By

Lucia CerchieDeveloper Advocate

Sep 29, 2023Read Time: 4 min

Issues with broker load

Ever dealt with a misbehaving consumer group? Imbalanced broker load? This could be due to your consumer group and partitioning strategy!

Once, on a dark and stormy night, I set myself up for this error. I was creating an application to demonstrate how you can use Apache Kafka® to decouple microservices. The function of my “microservices” was to create latte objects for a restaurant ordering service. It was set up a little like this:

I wanted to implement this in Kafka by using consumers, each reading from a common coffee topic, but with their own partition. Now this was a naive approach. Why?

Well, let’s say it’s pumpkin spice latte season. In the United States, that’s a period of time lasting from August to December during which the pumpkin spice coffee flavor undergoes overwhelming demand at cafes. This would mean the pumpkin spice-flavored coffee order events would dramatically increase.

Since I had each consumer in the same group for one coffee topic, every coffee ‘microservice’ consumed only from a single partition, ignoring Kafka’s mechanism for parallelism. Eventually, I can expect a hot spot in my pumpkin spice partition, which means that some of my brokers would be slow to distribute their pumpkin spice—and others would not be used to their full potential.

How do we solve this issue? Instead of having every consumer in one group, and organizing the different latte flavors by message key, I could send each latte flavor to its own multithreaded topic, and the load for latte orders would then be balanced across partitions, like so:

Now, technically, I could keep all the consumers in the same group, consuming from different topics. But in this case, it’d be best practice to give each consumer its own group. That way, I could scale these topics separately by adding more consumers to a specific group.

Let’s step back a little bit and consider consumer group strategy in general.

What is the function of a consumer group?

One of the more foundational concepts of consumer groups is that each consumer, once assigned to a group, shares the workload. You can read more about this in our blog post about configuring consumer Group IDs, but basically, when each consumer has the same Group ID, they cannot read from the same partition:

Now, you don’t always need to reach for a single consumer group, like I had done at first in my latte project… think through your use case! Sometimes, you may have two consumers (each in different consumer groups, allowing for parallel processing) for two different services, like a customer address service and a customer delivery notification service, that would need to read from the same partition in the same topic. These two consumers in different groups reading from the same topic can pick up reading from different offsets, which they would not be able to do if Kafka used a queue rather than a persistent log.

However, if you do want your consumers to read from the same topic, you need to consider a couple of things: 1) the number of consumers in your group and 2) the number of partitions. This consideration is important because the consumers will share the workload as equally as possible among themselves. Sometimes making this decision is easy—I’m writing a demo app in Kafka Streams with 1 consumer and 1 partition right now because I just need to show developers how to use a .process() function in Kafka Streams. But in most use cases, this decision takes careful consideration.

Considerations in favor of fewer consumers in relation to a high number of partitions include things like how high you want your consumer throughput to be since each partition is given one thread per consumer. You might also want room to increase parallelism later on.

Considerations against having fewer consumers to a higher number of partitions include things like higher unavailability in the case of unclean failure since leader election will take longer.

If you’re interested in reading more about consumer strategy in relation to partitions, Jun Rao has an excellent blog post on the subject.

Exactly how do I configure my consumer group?

Let’s take a look at what consumer group configuration looks like in JS, Java, and Python.

Here’s the example of consumer group configuration in KafkaJS that we saw above:

export function createConsumerConfigMap(config, groupid) {
  if (config.hasOwnProperty('security. protocol')) {
    return {
      'bootstrap.servers': config['bootstrap.servers'],
      'sasl.username': config['sasl.username'],
      'sasl.password': config['sasl.password'],
      'security.protocol': config['security.protocol'],
      'sasl.mechanisms': config['sasl.mechanisms'],
      'group.id': 'consumer-group-id',
    }
  } else {
    return {
      'bootstrap.servers': config['bootstrap.servers'],
      'group.id': 'consumer-group-id',
    }
  }
}

In this example, the group.id is set when the function that creates the consumer is called.

In the Python Kafka client, you might do something like this in a config.properties file:

[consumer]
auto.offset.reset=earliest
group.id=consumer-group-id
enable.auto.commit=true
max.poll.interval.ms=3000000

Then you could import those properties whenever you’re creating a consumer.

Now, if you’re using Java to create a Kafka Streams app, it gets a little trickier. That’s because the Group ID configuration on the consumer is not called group.id or something similar that you might expect. Instead, the APPLICATION_ID_CONFIG is what Kafka Streams ends up using as the consumer Group ID.

props.put(StreamsConfig.APPLICATION_ID_CONFIG, "consumer-group-id");

Where to go from here

I’m hoping this blog post gave you something to think about when it comes to consumer group strategy, whether you’re new to configuring consumer groups or designing a large piece of architecture. If you’d like to learn more, here are some resources I recommend:

Kafka Consumer Groups Tool: Documentation on using Kafka’s consumer groups tool to check metadata on groups
Configuring Kafka Consumer Group IDs: Learn more about consumer Group IDs, specifically
Here’s a video introduction to Kafka Consumers
The Kafka Consumer Group Protocol is about to get simpler—read KIP-848 for the details

Lucia Cerchie is a developer advocate at Confluent. She believes in a human-centered developer experience, in the teaching of responsibility of developer advocates, and in the joy of learning. She blogs at her personal site, https://luciacerchie.dev/blog/, here on Confluent, https://www.confluent.io/blog/author/lucia-cerchie/ and on Dev.to https://dev.to/cerchie

The Confluent Developer Newsletter

Get Apache Kafka and Flink news delivered to your inbox biweekly or read the latest editions on Confluent Developer!

Read the Latest

Course: Apache Kafka 101

Learn the basics of Apache Kafka in our Confluent Developer course.

Start the Course

Did you like this blog post? Share it now

What Are Apache Kafka Consumer Group IDs?

Dec 6, 2022

Learn why configuring consumer Group IDs are a crucial part of designing your consumer application. By the end of this post, you’ll understand the impact they have on three areas: work sharing, new data detection, and data recovery.

Lucia Cerchie

Kafka-docker-composer: A Simple Tool to Create a docker-compose.yml File for Failover Testing

Apr 18, 2024

The Kafka-Docker-composer is a simple tool to create a docker-compose.yml file for failover testing, to understand cluster settings like Kraft, and for the development of applications and connectors.

Sven Erik Knop

Issues with broker load

What is the function of a consumer group?

Exactly how do I configure my consumer group?

Where to go from here

The Confluent Developer Newsletter

Course: Apache Kafka 101

Did you like this blog post? Share it now

Subscribe to the Confluent blog

What Are Apache Kafka Consumer Group IDs?

Kafka-docker-composer: A Simple Tool to Create a docker-compose.yml File for Failover Testing