How to Install Apache Kafka on Ubuntu 20.04. Do you know how much data we consume per day? As per the Gartner report, approximately we consume 1.8 GB of data per day in our day to day activities. The challenge here we face is huge data collection and analysis. So we need a powerful messaging system that should perform user friendly and data analysis tasks. Yes, you are right, we are talking about “Apache Kafka”, a published subscription service container messaging system to handle data distribution among multiple applications. In this Apache Kafka post, we will explain fundamental concepts to help our readers to enhance their skillsets in Apache Kafka. Let’s get started;
What is Apache Kafka?
Apache Kafka is a published subscription messaging system designed for data distribution throughout the system. It is a popular message broker system that works well and offers maximum benefits compared to other traditional message systems. This message system comes up with various advanced features like built in partition, inherent fault tolerance, and replication which make it a good fit for large scale message processing business applications. Distributed application in the Kafka messaging is built on reliable message queuing architecture. In Kafka, there are two types of messaging patterns available; one is point to point and another one is publish subscribe. Most of the recent messaging system applications use the Publish subscriber type. Apache Kafka Server system is built on the top of Zookeeper synchronization service and integrates well with Apache service containers.
How Does Apache Kafka Work?
In this section, we will be explaining the complete workflow of Apache Kafka and its components. The following image illustrates the overall work nature of Apache Kafka:
Apache Kafka workflow architecture is composed of four major components, such as:
Broker
Zookeeper
Producers
Consumers
Let me explain them in brief:
Broker: broker component in Kafka architecture is mainly used to maintain load balance. Most importantly, Kafka brokers are stateless and make use of zookeepers to maintain their cluster state. One Kafka broker state is able to handle hundreds/ thousands of reads or writes per second.
Zookeeper: Zookeeper in Kafka is used to manage and coordinate kafka brokers. This type of service sends notifications to the producers and consumers about the new broker presence or failure of the broker in the Kafka ecosystem. As per the notification received by zookeepers; both producer and consumer take further decisions and start coordinating with other brokers.
Producers: the main task of the producer is to push the data to the broker. When there is a new broker available, the producer automatically searches for it and sends a message to the new broker. They never wait for the acknowledgment and send the message as fast as they can.
Consumers: we know that Kafka brokers are stateless so consumers play an important role in maintaining the messages consumed by other offset partitions (another set of brokers). If the consumer receives any acknowledgment about the message consumption; it means that the consumer has consumed all the prior messages. The consumer also sends a pull request to the brokers to consume buffer data bytes. The consumer can skip or rewind at any point of partitions simply just by supplying an offset value.
The following are the key benefits of the Apache Kafka:
Message ordering: Apache Kafka provides a message ordering because of the partitions in the workflow. Therefore the messages are sent to each topic by using message keys.
Offers lifetime messaging: Apache Kafka is a log, which means that messages are always there. Messages can manage this by specifying a message retention policy.
Delivery guarantees: Apache Kafka ecosystem retains order inside a partition. In a partition, Kafka guarantees that the whole batch of messages either fails or passes.
Performance: Higher performance due to simpler message semantic and using proprietary protocols.
High level reliability: Apache Kafka messenger system is a distributed one, and also helps to the prevention of faults.
Scalability: Apache Kafka messaging system scales very easily, hence no downtime is required.
Durability: Apache Kafka message system uses the “distributed commit log” option that enables the messages to be persistent on disk quickly. Hence it is a more durable message system.
In this post, we will explain how to install Apache Kafka on Ubuntu 20.04.
Apache Kafka is a Java based application. So Java must be installed on your system. If not installed, you can install it by running the following command:
apt-get install default-jdk -y
Once Java is installed, verify the Java installation using the following command:
java --version
You will get the following output:
openjdk 11.0.13 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
Install Apache Kafka
Before starting, it is recommended to create a dedicated user to run Apache Kafka. You can create it using the following command:
adduser kafka
Next, add Kafka user to the sudo group with the following command:
adduser kafka sudo
Next, log in as a Kafka user and download the latest version of Apache Kafka using the following command:
su - kafka
wget https://dlcdn.apache.org/kafka/2.7.2/kafka-2.7.2-src.tgz
Once the download is completed, extract the downloaded file with the following command:
tar -xvzf kafka-2.7.2-src.tgz
mv kafka-2.7.2-src kafka
Next, exit from the Kafka user with the following command:
exit
Next, you will also need to install the Gradle to your system. You can install it with the following command:
cd /home/kafka/kafka
./gradlew jar -PscalaVersion=2.13.3
Next, set proper ownership to the Kafka directory:
chown -R kafka:kafka /home/kafka/kafka
Create Systemd Unit Files for Kafka and Zookeeper
Next, you will need to create a systemd service file for both Zookeeper and Kafka. First, create a Zookeeper service file using the following command:
Save and close the file then navigate to the CMAK directory and create a zip file for deploying the application:
cd ~/CMAK
./sbt clean dist
You will get the following output:
[info] Compilation completed in 16.955s.
model contains 640 documentable templates
[info] Main Scala API documentation successful.
[info] Compiling 136 Scala sources and 2 Java sources to /root/CMAK/target/scala-2.12/classes ...
[info] LESS compiling on 1 source(s)
[success] All package validations passed
[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.5.zip
[success] Total time: 414 s (06:54), completed Jan 19, 2022, 6:40:12 AM
Next, naviage to the ~/CMAK/target/universal directory and unzip the zip file:
cd ~/CMAK/target/universal
unzip cmak-3.0.0.5.zip
Next, change the directory to the extracted directory and run the cmak binary:
cd cmak-3.0.0.5
bin/cmak
If everything is fine, you will get the following output:
In the above guide, we explained how to install Apache Kafka on Ubuntu 20.04. We also explained how to install the Kafka Cluster Manager to manage Apache Kafka. I hope you can now deploy the Apache Kafka in the production environment.
I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.