How to Install Apache Kafka on Debian 11 (Linux Message Broker)

How to Install Apache Kafka on Debian 11 (Linux Message Broker). In this article we will introduce what Apache Kafka is with it’s features and next we will move onto installation phase on Debian 11. 

What is Apache Kafka

Apache Kafka is a distributed event streaming platform that receives data from distinct sources and shares it with the target system in real time. Written in Scala and Java, the open source distributed publish subscribe messaging system facilitates the asynchronous data exchange between servers and applications.

Today, the adaptation of Kafka has enabled businesses to deliver timely experiences to consumers and manage real time data.

Earlier, the data processing followed the batch processing technique. As per the periodic batch processing technique, all the raw data was collected and stored first. Later it was processed at arbitrary time intervals. For example, companies used to wait till the month end or week to analyze all the collected information, calculate profits and expenses. The only drawback of practicing batch processing was it did not provide real time data.

With the growth and expansion of businesses, the need for analyzing data in real time has become necessary to make better decisions and strategies. With Apache Kafka server this requirement to stream events in real time was resolved. Another feature that makes Kafka different from other messaging systems is it stores all messages for a period and consumers are solely responsible for tracking read messages.

If you want to build resilient data services and applications, look no further. Kafka is a fast, highly scalable and fault tolerant publish subscribe system. It has five core functions, including Publish, Consume, Process, Connect, Store. These functions enable the system to deliver higher throughput. Further, it relies on the file system for maintaining and caching purposes.

Today, thousands of companies trust the platform as it stores all the streams safely in a fault tolerant cluster and delivers messages at a network limited throughput. Also, it has an out of the box Connect interface that allows integration with various event sources such as Elasticsearch, AWS S3, Postgres, etc.

Next in this tutorial about How to Install Apache Kafka on Debian 11 (Linux Message Broker) is to explain Apache Kafka benefits. Let’s do it!

Benefits of Apache Kafka

There are various reasons why many high profile companies are investing in Apache Kafka for collecting data in real time. Have a look at some of its benefits that might convince you and help change your mind.

Open Source

Kafka is an Open Source platform, i.e., the source code is free and available to all developers or users for modification. There are no restrictions or licensing fees for the same. 

Scale and Speed

Unlike other messaging systems, Kafka provides the data in real time. Also, being a distributed platform, all the processing work is distributed among different physical and virtual machines. It further helps in scaling out and providing quick results.

Extensible

Kafka collaborates with Zookeeper to coordinate and synchronize with other services

Performance

Kafka provides a queue that can handle large amounts of data and move messages from one sender to another.

Fault tolerant

Kafka is a publish subscribe messaging system built for high throughput and fault tolerance.Kafka supports automatic recovery features and is resilient to node failures. It ensures that even if one node goes down the other will replaces it and deliver a quality result.

Replication

Copies of various topics are automatically generated, but with Kafka, customers have the ability to manually configure topics and prevent replication as per their needs.

Allows message replay

Kafka has certain features that enable multiple consumers to subscribe to a similar topic and replay the messages for a specific period of time.

Stream Processing

Apache Kafka allows seamless movement of data in the form of messages, streams, or records. Further, it allows users to inspect, transform and leverage data before moving. The platform is easy to use and supports a native approach for storing and moving data in real time.

Seamless Messaging Functionality

Organizations that use legacy communications models to deal with large volume data often find issues in communications and scalability. However, with the messaging and streaming functionality, Kafka has reduced this issue and users can publish, subscribe, store and process data in real time.

Next we will explain how to install Apache Kafka on Debian 11.

How to Install Apache Kafka on Debian 11

Install Java JDK

Apache Kafka is a Java based application. So Java must be installed on your system. If not installed, you can install it by running the following command:

				
					apt-get install default-jdk -y
				
			

Once Java is installed, verify the Java installation using the following command:

				
					java --version
				
			

You will get the following output:

				
					openjdk 11.0.14 2022-01-18
OpenJDK Runtime Environment (build 11.0.14+9-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.14+9-post-Debian-1deb11u1, mixed mode, sharing)
				
			

Install Kafka on Debian 11

Before starting, it is recommended to create a dedicated user to run Apache Kafka. You can create it using the following command:

				
					adduser kafka
				
			

Please add Kafka user to the sudo group with the following command:

				
					adduser kafka sudo
				
			

Next, log in as a Kafka user and download the latest version of Apache Kafka using the following command:

				
					su - kafka
wget https://archive.apache.org/dist/kafka/2.7.2/kafka-2.7.2-src.tgz
				
			

Once the download is completed, extract the downloaded file with the following command:

				
					tar -xvzf kafka-2.7.2-src.tgz
mv kafka-2.7.2-src kafka
				
			

Then exit from the Kafka user with the following command:

				
					exit
				
			

Now you will also need to install the Gradle to your system. You can install it with the following command:

				
					cd /home/kafka/kafka
./gradlew jar -PscalaVersion=2.13.3
				
			

Next, set proper ownership to the Kafka directory:

				
					chown -R kafka:kafka /home/kafka/kafka
				
			

Create Systemd Unit Files for Kafka and Zookeeper

Next, you will need to create a systemd service file for both Zookeeper and Kafka to manage their services.

First, create a Zookeeper service file using the following command:

				
					nano /etc/systemd/system/zookeeper.service
				
			

Add the following lines:

				
					[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

				
			

Save and close the file then create a systemd service file for Kafka using the following command:

				
					nano /etc/systemd/system/kafka.service
				
			

Add the following lines:

				
					[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

				
			

Save and close the file then reload the systemd daemon using the following command:

				
					systemctl daemon-reload
				
			

Next, start and enable the Apache Kafka service with the following command:

				
					systemctl enable --now kafka
				
			

Kafka Zookeeper

You can now check the status of the Apache Kafka and Zookeeper service using the following command:

				
					systemctl status kafka zookeeper
				
			

You will get the following output:

				
					● kafka.service
     Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-03-25 05:44:23 UTC; 10s ago
   Main PID: 8893 (sh)
      Tasks: 71 (limit: 4679)
     Memory: 333.2M
        CPU: 8.748s
     CGroup: /system.slice/kafka.service
             ├─8893 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kaf>
             └─8894 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvo>

Mar 25 05:44:23 debian11 systemd[1]: Started kafka.service.

● zookeeper.service
     Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-03-25 05:44:23 UTC; 10s ago
   Main PID: 8892 (java)
      Tasks: 31 (limit: 4679)
     Memory: 81.9M
        CPU: 3.137s
     CGroup: /system.slice/zookeeper.service
             └─8892 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGC>

Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,712] INFO Created server with tickTime 3000 minSessionTimeout >
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,759] INFO Using org.apache.zookeeper.server.NIOServerCnxnFacto>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,770] INFO Configuring NIO connection handler with 10s sessionl>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,793] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zoo>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,834] INFO zookeeper.snapshotSizeFactor = 0.33 (org.apache.zook>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,841] INFO Snapshotting: 0x0 to /tmp/zookeeper/version-2/snapsh>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,848] INFO Snapshotting: 0x0 to /tmp/zookeeper/version-2/snapsh>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,885] INFO PrepRequestProcessor (sid:0) started, reconfigEnable>
Mar 25 05:44:25 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:25,911] INFO Using checkIntervalMs=60000 maxPerMinute=10000 (org.>
Mar 25 05:44:26 debian11 zookeeper-server-start.sh[8892]: [2022-03-25 05:44:26,641] INFO Creating new log file: log.1 (org.apache.zookeeper.s>

				
			

Install Cluster Manager for Apache Kafka

CMAK is an open source tool for managing and monitoring Kafka services developed by Yahoo. First, download it using the following command:

				
					apt-get install git -y
git clone https://github.com/yahoo/CMAK.git
				
			

Once the download is completed, edit the CMAK configuration file:

				
					nano ~/CMAK/conf/application.conf
				
			

Change the following lines:

				
					kafka-manager.zkhosts="kafka-manager-zookeeper:2181"
kafka-manager.zkhosts=${?ZK_HOSTS}
cmak.zkhosts="localhost:2181"
cmak.zkhosts=${?ZK_HOSTS}

				
			

Save and close the file then navigate to the CMAK directory and create a zip file for deploying the application:

				
					cd ~/CMAK
./sbt clean dist
				
			

You will get the following output:

				
					[info] Main Scala API documentation to /root/CMAK/target/scala-2.12/api...
[info] Non-compiled module 'compiler-bridge_2.12' for Scala 2.12.10. Compiling...
[info] Compiling 136 Scala sources and 2 Java sources to /root/CMAK/target/scala-2.12/classes ...
[info]   Compilation completed in 17.571s.
model contains 645 documentable templates
[info] Main Scala API documentation successful.
[info] LESS compiling on 1 source(s)
[success] All package validations passed
[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.6.zip
[success] Total time: 192 s (03:12), completed Mar 25, 2022, 5:51:09 AM
Graal diagnostic output saved in /root/CMAK/dumps/1648187271178/graal_diagnostics_10375.zip

				
			

Next, naviage to the ~/CMAK/target/universal directory and unzip the zip file:

				
					cd ~/CMAK/target/universal
unzip cmak-3.0.0.6.zip
				
			

Please change the directory to the extracted directory and run the cmak binary:

				
					cd cmak-3.0.0.6
bin/cmak
				
			

If everything is fine, you will get the following output:

				
					2022-03-25 05:52:48,313 - [INFO] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
2022-03-25 05:52:48,315 - [INFO] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
2022-03-25 05:52:48,326 - [INFO] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
2022-03-25 05:52:48,329 - [INFO] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
2022-03-25 05:52:48,367 - [INFO] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
2022-03-25 05:52:48,371 - [INFO] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
2022-03-25 05:52:48,380 - [INFO] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
2022-03-25 05:52:48,411 - [INFO] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
2022-03-25 05:52:48,946 - [INFO] play.api.Play - Application started (Prod)
2022-03-25 05:52:49,443 - [INFO] k.m.a.KafkaManagerActor - Updating internal state...
2022-03-25 05:52:50,130 - [INFO] p.c.s.AkkaHttpServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000

				
			

At this point, CMAK is started and listening on port 9000.

Access Kafka Cluster Manager

You can now access the Kafka Cluster Manager using the URL http://your-server-ip:9000. You should see the following page:

Add Cluster

Click on the Cluster => Add Cluster to add the cluster. You should see the following page:

CMAK Cluster

Provide your cluster information and click on the Save button. You should see the following page:

Kafka cluster view

Now, click on the Go to cluster view. You should see the following page:

How to Install Apache Kafka on Debian 11 (Linux Message Broker) Conclusion

In the above guide, we explained how to install Apache Kafka on Debian 11. We also explained how to install the Kafka Cluster Manager to manage Apache Kafka. I hope you can now deploy the Apache Kafka in the production environment.

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x