15 Jul

Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokers

in Kafka

Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokers. There is a set of guidelines to follow with Kafka in order for data teams to avoid main deployment and management issue. Let’s start with introduction of Apache Kafka.

Let’s start with Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokers.

Also Read

RabbitMQ vs Kafka – Message Brokers (Pros and Cons)

What is Apache Kafka

Apache Kafka is a publish subscribe open source message broker application and a distributed streaming platform. It is the favourite tool for thousands of companies. They use it to develop scalable, reliable, high throughput and real time streaming systems.

The event platform is extensively popular among application developers and data management experts due to its simplifying work with data streams. However, it becomes relatively complex while scaling.

Its high throughput publish/subscribe pattern has automated data retention limits that do not work well when the consumers that cannot keep up with the data stream. Also, all the messages can disappear before they are seen.

Therefore, it can overwhelm you since the system hosting the data stream cannot scale according to the needs or is unreliable.

But, Don’t you worry! We are going to share twenty Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokers, that can help you deal with these complexities. So, let’s move further into the blog.

Also Read

RabbitMQ vs ActiveMQ – What’s The Difference? (Pros and Cons)

Apache Kafka benefits

Works as a buffer.

Highly scalable, highly reliable and durable.

Fault tolerance and low latency.

Handles the data in real time.

Use case is in batch approach like web activity tracking and log aggregation.

High Concurrency.

Also Read

How to Setup Apache Kafka Server on Azure/AWS/GCP

Apache Kafka and the Architecture

Before moving straight into the best practices of Kafka, let’s go through a quick recap of its meaning and architecture.

Apache Kafka server is a distributed messaging system that provides you with integrated data redundancy and resiliency, though remaining both scalable and high throughput. It contains automatic data retention limits that are suitable for applications treating data as streams. It is also efficient in supporting “compacted” streams that replicate a map of crucial value pairs.

Also Read

How to Install Apache Kafka on Ubuntu 20.04 (Kafka Cluster)

Next in our Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokersis to understand the elements of Kafka.

Key elements of Kafka

Some of the key terms of Kafka you should know to understand the best practices effortlessly are as follows:

Message – A message is a record or unit of data in Kafka, which contains a key and a value and also optional headers.

Producer – The job of the producer is to publish messages on Kafka topics. They decide which topic partition to publish to, either randomly or through partition algorithms as per the message’s key.

Broker – Kafka tends to work in a distributed system or cluster that constitutes nodes, also known as Broker.

Topic – It is a category that records data and publishes messages. Consumers can easily read the data written on them by subscribing to it.

Offset – Every message in a partition is allocated an offset. It is a monotonically increasing integer that offers a unique identifier for the message within the partition.

Consumer – Consumers are given the authority to read messages from Kafka topics by subscribing to the topic partition. The consuming application processes the message to accomplish whatever work is desired.

Consumer Group – Consumers are divided into logical consumer groups. Here, all the consumers work in a load balanced mode, which means each message will be seen by one consumer in a group. If a consumer leaves the group, the partition is automatically assigned to another person. This process is known as rebalancing.

Lags – When a consumer was unable to read from a partition as fast as messages are produced, they tend to lag. It is expressed as the number of offsets behind the head of the partition. The time incurred in recovering from the lag depends on how immediately the consumer is able to consume messages every second.

What are the Top 20 Kafka Best Practices for Topic, Partitions, Consumers, Producers and Brokers? Let’s find out.

Also Read

Kafka Security Best Practices Checklist to Securing your Server

Topic, Partitions, Consumers, Producers and Brokers Best Practices for Kafka

Kafka Best Practices For Partitions

1. Understand The Data Rate Of The Partitions To Ensure That You Have The Appropriate Retention Space

The data rate of the partition is considered the rate at which data is produced. It means it is the average message size times the number of messages per second. The data rate commands the amount of retention space, in bytes, required to guarantee retention for a given amount of time. You won’t be able to calculate the retention space required to meet the time based retention goal correctly if you know the data rate.

The data rate is also used for specifying the minimum performance required by a consumer to support without lagging.

2. If You Do Not Have Architectural Needs That Require You To Do Differently, You Can Use Random Partitioning While Writing To Topics

While operating at scale, irregular data rates in the partitions are difficult to manage. It is because:

Consumers at the higher throughput partitions need to process more messages in comparison to other consumers in the consumer group. It potentially leads to processing and networking bottlenecks.

Topic retention must be sized with higher data rates for the partition. It results in an increase in disk usage across other partitions in the topic.

When the attainment of an optimum balance for partition leadership is more complex than spreading the leadership across all brokers.

A higher throughput partition tends to carry ten times the weight of another partition in a similar topic.

Also Read

How to Install Apache Kafka on Debian 11 (Linux Message Broker)

Apache Kafka Best Practices For Consumers

3. When you Are Running an Older Version Of Kafka, You Need To Upgrade Them

The version 0.8.x enables consumers to use Apache ZooKeeper for coordination in the consumer group. However, the number of known bugs can lead to long running rebalances or even failure of the rebalance algorithms. During this process, consumers in each group are assigned one or more partitions. On the other hand, in a rebalance storm, partition ownership is reorganized continuously among the consumers to prevent them from making real progress on consumption.

4. Optimize Consumers’ Socket Buffers For High Speed Downloads

The parameters in Kafka 0.10.x is “receive. buffer. bytes” that defaults to 64kb. On the contrary, the parameter of Kafka version 0.8.x is “socket. receive. buffer. bytes”, defaulting to 100 kb. The default values of both of these versions are relatively small for high throughput environments, especially when a delay in network bandwidth between the broker and the consumer is significantly higher than Local Area Network (LAN). For high bandwidth networks having latencies of 1 millisecond or more, you should set the socket buffer to 8 or 16 MB. In terms of scarce memory, you should go for 1 MB.

Moreover, you can use the value of -1, which enables the underlying operating system to optimize the buffer size according to the network conditions. However, consumers stating with higher throughput should be aware that the automatic tuning will not occur immediately.

5. Design High Throughput Consumers To Apply Backpressure If Warranted

You should always consume what you can process efficiently. Because consuming more than you can process grinds to a halt and then gets removed from the consumer group. That is why it is significant to consume into fixed size buffers, especially in an off head when running in a Java Virtual Machine. When you use a fixed sized buffer, you tend to stop pulling so much data onto the heap. It is because JVM spends all of its time performing garbage collection instead of the work you want to achieve, i.e., processing messages.

6. Be Aware Of The Impact Garbage Collection Can Have While Running Consumers On A JVM

Having long garbage collection pauses results in dropped ZooKeeper sessions or consumer group rebalances. The same goes for brokers. It means that if the garbage collection pauses are too long, it can result in cluster elimination.

Also Read

How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker)

Follow this article Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokers to read about producers.

Best Practices for Producers in Kafka

7. Setup Producers To Wait For Acknowledgements

When you configure producers, they know that the message has actually made it to the partition of the broker. The Kafka 0.10.x version constitutes settings as acks, whereas the 0.8.x includes “request. Required. acks”. It provides fault tolerance through replicating to avoid the failure of a single node or a change in partition leadership in affecting the availability. When you configure producers with acks, you tend to lose messages silently.

8. Configure “retires” On Your Producers

In Kafka, the default value is always 3, which is often too low. The right value depends on the application. Therefore, for an application that has data loss that cannot be tolerated, you should consider “Integer. MAX_VALUE”. It will guide you in situations where the brokers leading the partition are not available to respond to a produce request immediately.

9. Optimize Buffer Size For High Throughput Producers

It is significant to buffer sizes, particularly “buffer. memory” and “batch. size” as it is a per partition setting delivering producer performance and memory usage, which can be correlated with the number of partitions in the topic. Here, the values are dependent on several factors, like:

Producer data rate
The number of partitions produced
The amount of memory available

Moreover, you should keep in mind that large buffers are not always better. If the producer stalls for some reason, more data buffered on the heap will simply mean more garbage collection.

10. Implement Applications To Track Metrics

If you want to track metrics, like produced messages, average produced message size, and the number of consumed messages, implementing your application plays a significant role.

Also Read

Kafka Architecture (Cluster, Topics, Producers, Partitions, Consumers, Zookeeper)

The Best Practices For Brokers with Kafka

11. Compacted Topics Need Memory and CPU Resources On Your Brokers

Compacting log requires both heap and CPU cycles on the brokers to complete successfully. Failed log compaction puts the broker at risk from a partition that grows unbounded. Therefore, optimize “log. cleaner. dedupe. buffer. size” and “log. cleaner. Threads” on your broker. However, keep in mind that these values can affect the heap usage of the brokers. If a broker shows an “OutOfMemoryError” exception, it will shut down and lose potential data. The buffer size and thread count depend both on the number of topic partitions to be cleaned and the data rate and key sizes of the messages in those partitions.

12. Examine Brokers For Network Throughput

It is significant to monitor brokers for network throughput, both in terms of Transmit (TX) and Receives (RX), as well as disk I/O, disk space, and usage. It is because capacity planning plays a crucial role in maintaining cluster performance.

Also Read

How to setup RabbitMQ on Windows Server in Azure/AWS/GCP

13. Divide Partition Leadership Among Brokers In The Cluster

Leadership incurs several network I/O resources. For instance, while running with replication factor 3, a leader should receive the partition data, transmit two copies to replicas and transmit to a number of consumers who want to consume the data. Therefore, we can say that being a leader is much more expensive in comparison to being a follower in terms of network i/O uses.

14. Monitor Your Broker For Synchronized Replica Reduction (ISR), Lack Of Replication Shards, And Unfavorable Leaders

There might be a sign of a potential problem in your cluster. For example, if you notice frequent ISR shrinks for a single partition, it indicates that a data rate for that partition exceeds the leader’s ability to service the consumer and replica threads.

15. Modify The Apache Log4j Properties As Required

With Kafka Broker Logging has the option to use a tremendous amount of disk space. However, it is significant to log completely. For this, broker logs are considered the best option and sometimes, the only way to reconstruct the sequence of events after an incident.

16. Disable Automatic Topic Creation Or Establish A Transparent Policy To Cleanup Unused Topics

If there are no messages for several number days, the topic is defunct and should be removed from the Kafka cluster. This way, you will be able to create additional metadata within the cluster that you will manage.

17. For High Throughput Persistent Brokers, Provision Enough Memory To Prevent Reading From The Disk Subsystem

The Partition data should be distributed directly from the operating system’s file system cache if possible. However, by doing so, you will have to ensure that the consumer can keep up. It is because a lagging consumer will force the broker to read from the disk.

18. With Large Groups With High Throughput Service Level Objectives (SLOs), Consider Isolating Topics In A Subset Of Moderators

Determining which topics to isolate completely depends on the requirements of your business. For example, if you are dealing with several Online Transaction Processing (OTP) systems using similar clusters, isolating topics for every system to a different subset of brokers is considered essential to limit the potential blast radius of an incident.

19. Using Older Clients With Newer Subject Message Formats And Vice Versa Puts Extra Burden On Brokers

It is because they tend to convert the formats in the place of the clients. You should avoid it whenever possible.

20. Stop Assuming That Testing a Broker On a Local Desktop Machine is Representative of the Performance You’ll See in Production

When you test over a loopback interface using replication factor 1, it is considered a very different topology from most productive environments. It is because the network latency is negligible in this interface. Also, the time required to receive leader acknowledgments varies greatly when no replications are involved.

Therefore, with the help of these tips, you will be able to use Kafka more efficiently. Moreover, it is also very suitable for improving your expertise in this streaming tool.

Brilliant effort! Thank you for reading the Kafka Best Practices-Topic, Partitions, Consumers, Producers and Brokers. It is time to conclude.

Also Read

Top 15 Best RabbitMQ Alternatives Message Brokers (Pros and Cons)