How to Install Apache Kafka on Ubuntu 20.04 (Kafka Cluster)

How to Install Apache Kafka on Ubuntu 20.04.  Do you know how much data we consume per day? As per the Gartner report, approximately we consume 1.8 GB of data per day in our day to day activities. The challenge here we face is huge data collection and analysis. So we need a powerful messaging system that should perform user friendly and data analysis tasks. Yes, you are right, we are talking about “Apache Kafka”, a published subscription service container messaging system to handle data distribution among multiple applications. In this Apache Kafka post, we will explain fundamental concepts to help our readers to enhance their skillsets in Apache Kafka. Let’s get started;

What is Apache Kafka?

Apache Kafka is a published subscription messaging system designed for data distribution throughout the system. It is a popular message broker system that works well and offers maximum benefits compared to other traditional message systems. This message system comes up with various advanced features like built in partition, inherent fault tolerance, and replication which make it a good fit for large scale message processing business applications. Distributed application in the Kafka messaging is built on reliable message queuing architecture. In Kafka, there are two types of messaging patterns available; one is point to point and another one is publish subscribe. Most of the recent messaging system applications use the Publish subscriber type. Apache Kafka Server system is built on the top of Zookeeper synchronization service and integrates well with Apache service containers.

How Does Apache Kafka Work?

In this section, we will be explaining the complete workflow of Apache Kafka and its components. The following image illustrates the overall work nature of Apache Kafka:

Apache Kafka workflow architecture is composed of four major components, such as:

  • Broker
  • Zookeeper
  • Producers
  • Consumers

Let me explain them in brief:

Broker: broker component in Kafka architecture is mainly used to maintain load balance. Most importantly, Kafka brokers are stateless and make use of zookeepers to maintain their cluster state. One Kafka broker state is able to handle hundreds/ thousands of reads or writes per second.

Zookeeper: Zookeeper in Kafka is used to manage and coordinate kafka brokers. This type of service sends notifications to the producers and consumers about the new broker presence or failure of the broker in the Kafka ecosystem. As per the notification received by zookeepers; both producer and consumer take further decisions and start coordinating with other brokers.

Producers: the main task of the producer is to push the data to the broker. When there is a new broker available, the producer automatically searches for it and sends a message to the new broker. They never wait for the acknowledgment and send the message as fast as they can.

Consumers: we know that Kafka brokers are stateless so consumers play an important role in maintaining the messages consumed by other offset partitions (another set of brokers). If the consumer receives any acknowledgment about the message consumption; it means that the consumer has consumed all the prior messages. The consumer also sends a pull request to the brokers to consume buffer data bytes. The consumer can skip or rewind at any point of partitions simply just by supplying an offset value.

Benefits of Apache Kafka

The following are the key benefits of the Apache Kafka:

Message ordering: Apache Kafka provides a message ordering because of the partitions in the workflow. Therefore the messages are sent to each topic by using message keys.

Offers lifetime messaging: Apache Kafka is a log, which means that messages are always there. Messages can manage this by specifying a message retention policy.

Delivery guarantees: Apache Kafka ecosystem retains order inside a partition. In a partition, Kafka guarantees that the whole batch of messages either fails or passes.

Performance: Higher performance due to simpler message semantic and using proprietary protocols.

High level reliability: Apache Kafka messenger system is a distributed one, and also helps to the prevention of faults.

Scalability: Apache Kafka messaging system scales very easily, hence no downtime is required.

Durability: Apache Kafka message system uses the “distributed commit log” option that enables the messages to be persistent on disk quickly. Hence it is a more durable message system.

In this post, we will explain how to install Apache Kafka on Ubuntu 20.04.

Install Java

Apache Kafka is a Java based application. So Java must be installed on your system. If not installed, you can install it by running the following command:

				
					apt-get install default-jdk -y
				
			

Once Java is installed, verify the Java installation using the following command:

				
					java --version
				
			

You will get the following output:

				
					openjdk 11.0.13 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
				
			

Install Apache Kafka

Before starting, it is recommended to create a dedicated user to run Apache Kafka. You can create it using the following command:

				
					adduser kafka
				
			

Next, add Kafka user to the sudo group with the following command:

				
					adduser kafka sudo
				
			

Next, log in as a Kafka user and download the latest version of Apache Kafka using the following command:

				
					su - kafka
wget https://dlcdn.apache.org/kafka/2.7.2/kafka-2.7.2-src.tgz
				
			

Once the download is completed, extract the downloaded file with the following command:

				
					tar -xvzf kafka-2.7.2-src.tgz
mv kafka-2.7.2-src kafka

				
			

Next, exit from the Kafka user with the following command:

				
					exit
				
			

Next, you will also need to install the Gradle to your system. You can install it with the following command:

				
					cd /home/kafka/kafka
./gradlew jar -PscalaVersion=2.13.3
				
			

Next, set proper ownership to the Kafka directory:

				
					chown -R kafka:kafka /home/kafka/kafka
				
			

Create Systemd Unit Files for Kafka and Zookeeper

Next, you will need to create a systemd service file for both Zookeeper and Kafka. First, create a Zookeeper service file using the following command:

				
					nano /etc/systemd/system/zookeeper.service
				
			

Add the following lines:

				
					[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

				
			

Save and close the file then create a systemd service file for Kafka using the following command:

				
					nano /etc/systemd/system/kafka.service
				
			

Add the following lines:

				
					[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

				
			

Save and close the file then reload the systemd daemon using the following command:

				
					systemctl daemon-reload
				
			

Next, start and enable the Apache Kafka service with the following command:

				
					systemctl enable --now kafka
				
			

You can now check the status of the Apache Kafka and Zookeeper service using the following command:

				
					systemctl status kafka zookeeper
				
			

You will get the following output:

				
					● kafka.service
Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-01-19 06:27:07 UTC; 1min 9s ago
Main PID: 6576 (sh)
Tasks: 69 (limit: 2353)
Memory: 339.2M
CGroup: /system.slice/kafka.service
├─6576 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kaf>
└─6581 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvo>

Jan 19 06:27:07 ubuntu2004 systemd[1]: Started kafka.service.

● zookeeper.service
Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-01-19 06:27:07 UTC; 1min 9s ago
Main PID: 6575 (java)
Tasks: 27 (limit: 2353)
Memory: 59.5M
CGroup: /system.slice/zookeeper.service
└─6575 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGC>

Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,772] INFO Created server with tickTime 3000 minSessionTimeou>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,801] INFO Using org.apache.zookeeper.server.NIOServerCnxnFac>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,814] INFO Configuring NIO connection handler with 10s sessio>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,828] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.z>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,875] INFO zookeeper.snapshotSizeFactor = 0.33 (org.apache.zo>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,886] INFO Snapshotting: 0x0 to /tmp/zookeeper/version-2/snap>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,896] INFO Snapshotting: 0x0 to /tmp/zookeeper/version-2/snap>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,958] INFO PrepRequestProcessor (sid:0) started, reconfigEnab>
Jan 19 06:27:09 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:09,971] INFO Using checkIntervalMs=60000 maxPerMinute=10000 (or>
Jan 19 06:27:10 ubuntu2004 zookeeper-server-start.sh[6575]: [2022-01-19 06:27:10,742] INFO Creating new log file: log.1 (org.apache.zookeeper>
lines 1-31/31 (END)
				
			

Install Cluster Manager for Apache Kafka

CMAK is an open source tool for managing and monitoring Kafka services developed by Yahoo. First, download it using the following command:

				
					apt-get install git -y
git clone https://github.com/yahoo/CMAK.git
				
			

Once the download is completed, edit the CMAK configuration file:

				
					nano ~/CMAK/conf/application.conf
				
			

Change the following lines:

				
					kafka-manager.zkhosts="kafka-manager-zookeeper:2181"
kafka-manager.zkhosts=${?ZK_HOSTS}
cmak.zkhosts="localhost:2181"
cmak.zkhosts=${?ZK_HOSTS}
				
			

Save and close the file then navigate to the CMAK directory and create a zip file for deploying the application:

				
					cd ~/CMAK
./sbt clean dist
				
			

You will get the following output:

				
					[info] Compilation completed in 16.955s.
model contains 640 documentable templates
[info] Main Scala API documentation successful.
[info] Compiling 136 Scala sources and 2 Java sources to /root/CMAK/target/scala-2.12/classes ...
[info] LESS compiling on 1 source(s)
[success] All package validations passed
[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.5.zip
[success] Total time: 414 s (06:54), completed Jan 19, 2022, 6:40:12 AM
				
			

Next, naviage to the ~/CMAK/target/universal directory and unzip the zip file:

				
					cd ~/CMAK/target/universal
unzip cmak-3.0.0.5.zip
				
			

Next, change the directory to the extracted directory and run the cmak binary:

				
					cd cmak-3.0.0.5
bin/cmak
				
			

If everything is fine, you will get the following output:

				
					2022-01-19 06:41:18,495 - [INFO] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
2022-01-19 06:41:18,496 - [INFO] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
2022-01-19 06:41:18,517 - [INFO] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
2022-01-19 06:41:18,519 - [INFO] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
2022-01-19 06:41:18,525 - [INFO] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
2022-01-19 06:41:18,551 - [INFO] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
2022-01-19 06:41:19,149 - [INFO] play.api.Play - Application started (Prod)
2022-01-19 06:41:19,572 - [INFO] k.m.a.KafkaManagerActor - Updating internal state...
2022-01-19 06:41:20,665 - [INFO] p.c.s.AkkaHttpServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
				
			

At this point, CMAK is started and listening on port 9000.

Access Kafka Cluster Manager

You can now access the Kafka Cluster Manager using the URL http://your-server-ip:9000. You should see the following page:

Click on the Cluster => Add Cluster to add the cluster. You should see the following page:

Provide your cluster information and click on the Save button. You should see the following page:

Next, change the directory to Kafka and create a sample topic using the following command:

				
					cd /home/kafka/kafka/
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Topic1
				
			

Now, go to cluster view then click Topic => List. You should see your create topic on the following page:

Install Apache Kafka on Ubuntu 20.04 Conclusion

In the above guide, we explained how to install Apache Kafka on Ubuntu 20.04. We also explained how to install the Kafka Cluster Manager to manage Apache Kafka. I hope you can now deploy the Apache Kafka in the production environment.

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

4 2 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x