10 May

How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker)

in Kafka

How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker). In this tutorial we will introduce Apache Kafka with its pros and benefits and move over to installation phase with creating Kafka and Zookeeper Service File with CMAK Kafka cluster manager on your server. Let’s start.

What is Apache Kafka

Apache Kafka is an open source, dispersed streaming platform that allows you to create real time and event driven applications. Apache Kafka is message broker software.

Today, numerous data sources create multiple data records that include events where there is a need to keep a digital record of what is happening. These streams provide opportunities for applications to respond to data and applications in real time. Kafka, written in Scala and Java allow you to build applications that continuously process and consume these streams at a very high speed, along with higher fidelity and accuracy. In addition Kafka server uses the TCP based binary protocol, which is optimized for efficiency and relies on the “message set” abstraction.

Kafka provides three crucial functions to its users:

It allows you to publish and subscribe to the stream of records.
It helps you to store this stream of applications that react to the streams of data.
It processes a stream of records of data pipelines in real time.

A Kafka cluster is highly scalable and fault tolerant, but it also has a much higher throughput unlike ActiveMQ and RabbitMQ.

Also Read

Kafka Architecture (Cluster, Topics, Producers, Partitions, Consumers, Zookeeper)

Kafka usecases

Website activity tracking.

Log aggregation

Event Sourcing.

Commits log for distributed system.

Provides Metrics.

API's

Therefore, you, as a developer can leverage Kafka via four APIs:

Producer API

Using this API, you can publish a stream to any Kafka topic. By topic, we mean a named log used for storing records in order so that they are relative to one another. Also, once the record is written to a topic, it does not get altered or deleted. Instead, they are stored in the topic for a preconfigured time.

Consumer API

This API allows applications to take subscriptions to one or more topics. They also enable them to ingest and process the stream stored in them.

Stream API

It is built upon producer and consumer APIs and constitutes complex processing capabilities with which an application can perform continuous, front to back stream processing. The entire process is conducted to specifically consume records from one or more topics. This way, you can analyze, aggregate and transform them accordingly.

Connector API

With the help of this API, you can build connectors, reusable producers, or consumers that simplifies and automates the process of integrating data sources into a Kafka cluster.

Also Read

Top 15 Best RabbitMQ Alternatives Message Brokers (Pros and Cons)

Benefits Of Apache Kafka

Apache Kafka provides you with the following pros:

High Throughput – Since Kafka does not have large hardware, it is capable of managing high velocity and high volume data. It can also support the message throughput of multiple messages within seconds.

Fault Tolerant – Kafka has an inherent capability of tolerating fault. This way, it becomes failure resistant to machines or nodes within a cluster.

Scalable – Kafka can be easily scaled out without incurring any additional downtime on the nodes. It also makes the process of handling messages completely transparent and even makes them seamless.

Capable Of being a Message Broker – Kafka is considered an efficient replacement for a more traditional message broker. By message broker, we mean an intermediary program from the formal messaging protocol of the publisher to the formal messaging protocol of the receiver.

Persistent – It provides persistent messages, thereby making it highly durable and reliable.

Capable of Batch Handling – Kafka can also handle use cases based on batches. They can also do the work of the traditional ETL because of its ability to perform persistent messaging.

Real Time Handling – Kafka can even handle a real time data pipeline.

Consumer Friendly – Using Kafka, you can integrate with several consumers. It is because it behaves or acts distinctively as per the consumer it is integrating with. It can integrate into a variety of languages with the consumers.

High Concurrency – As discussed earlier, Kafka has the ability to handle multiple messages within a second in low latency situations. Moreover, it even permits the reading and writing of messages at high concurrency.

Distributed – Kafka is built on a distributed architecture, making it highly scalable by using capabilities like partitioning and replication.

Also Read

How to Setup Apache Kafka Server on Azure/AWS/GCP

Follow this post below to install Apache Kafka on CentOS Stream 8.

How to Install Apache Kafka on CentOS Stream 8

Install Java JDK

Apache Kafka is a Java based application. So Java JDK must be installed on your server. If not installed, you can install it by running the following command:

				
					dnf install java-11-openjdk -y

After the Java installation, you can verify the Java version with the following command:

				
					java --version

You should see the following output:

				
					openjdk 11.0.13 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)

Also Read

RabbitMQ vs Redis – Message Brokers (Pros and Cons)

Install Apache Kafka

First, you will need to create a dedicated user for Apache Kafka. You can create it using the following command:

				
					adduser kafka

Next, switch the user to Kafka and download the latest version of Apache Kafka using the following command:

				
					su - kafka
wget https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz

Once the download is completed, extract the downloaded file with the following command:

				
					tar -xvzf kafka_2.13-3.1.0.tgz

Next, rename the extracted directory to kafka using the following command:

				
					mv kafka_2.13-3.1.0 kafka

Following that please exit from the Kafka user with the following command:

				
					exit

Create a Kafka and Zookeeper Service File

It is a good idea to create a systemd service file to manage both Apache Kafka and Zookeeper services. First, create a Zookeeper service with the following command:

				
					nano /etc/systemd/system/zookeeper.service

Add the following lines:

				
					[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file then reload the systemd daemon using the following command:

				
					systemctl daemon-reload

Next, start and enable the Apache Kafka service with the following command:

				
					systemctl enable --now kafka

You can now check the status of the Apache Kafka and Zookeeper service using the following command:

				
					systemctl status kafka zookeeper

You will get the following output:

				
					● kafka.service
   Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-04-06 05:26:47 UTC; 29s ago
 Main PID: 1872 (sh)
    Tasks: 71 (limit: 23696)
   Memory: 328.5M
   CGroup: /system.slice/kafka.service
           ├─1872 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka>
           └─1873 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvoke>

Apr 06 05:26:47 linux systemd[1]: Started kafka.service.

● zookeeper.service
   Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-04-06 05:26:47 UTC; 29s ago
 Main PID: 1871 (java)
    Tasks: 32 (limit: 23696)
   Memory: 69.3M
   CGroup: /system.slice/zookeeper.service
           └─1871 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCIn>

Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,889] INFO zookeeper.snapshot.compression.method = CHECKED (org.ap>
Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,891] INFO Snapshotting: 0x0 to /tmp/zookeeper/version-2/snapshot.>
Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,898] INFO Snapshot loaded in 22 ms, highest zxid is 0x0, digest i>
Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,898] INFO Snapshotting: 0x0 to /tmp/zookeeper/version-2/snapshot.>
Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,901] INFO Snapshot taken in 4 ms (org.apache.zookeeper.server.Zoo>
Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,956] INFO zookeeper.request_throttler.shutdownTimeout = 10000 (or>
Apr 06 05:26:51 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:51,958] INFO PrepRequestProcessor (sid:0) started, reconfigEnabled=f>
Apr 06 05:26:52 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:52,006] INFO Using checkIntervalMs=60000 maxPerMinute=10000 maxNever>
Apr 06 05:26:52 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:52,007] INFO ZooKeeper audit is disabled. (org.apache.zookeeper.audi>
Apr 06 05:26:53 linux zookeeper-server-start.sh[1871]: [2022-04-06 05:26:53,875] INFO Creating new log file: log.1 (org.apache.zookeeper.serv>

Also Read

How to Install Apache Kafka on Ubuntu 20.04 (Kafka Cluster)

Install Apache Kafka Cluster Manager

It is recommended to install a CMAK tool to manage and monitor Apache kafka from the web browser. CMAK is an open source tool for managing and monitoring Kafka services developed by Yahoo. You can download it using the following command:

				
					dnf install git -y
git clone https://github.com/yahoo/CMAK.git

Once the download is completed, edit the CMAK configuration file:

				
					nano ~/CMAK/conf/application.conf

Change the following lines:

				
					kafka-manager.zkhosts="kafka-manager-zookeeper:2181"
kafka-manager.zkhosts=${?ZK_HOSTS}
cmak.zkhosts="localhost:2181"
cmak.zkhosts=${?ZK_HOSTS}

Save and close the file then navigate to the CMAK directory and create a zip file for deploying the application:

				
					cd ~/CMAK
./sbt clean dist

Next, naviage to the ~/CMAK/target/universal directory and unzip the zip file:

				
					cd ~/CMAK/target/universal
unzip cmak-3.0.0.6.zip

Next, change the directory to the extracted directory and run the cmak binary:

				
					cd cmak-3.0.0.6
bin/cmak

You will get the following output:

				
					2022-04-06 05:35:08,033 - [INFO] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
2022-04-06 05:35:08,034 - [INFO] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
2022-04-06 05:35:08,057 - [INFO] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
2022-04-06 05:35:08,058 - [INFO] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
2022-04-06 05:35:08,102 - [INFO] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
2022-04-06 05:35:08,103 - [INFO] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
2022-04-06 05:35:08,113 - [INFO] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
2022-04-06 05:35:08,140 - [INFO] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
2022-04-06 05:35:08,199 - [INFO] play.api.Play - Application started (Prod)
2022-04-06 05:35:09,183 - [INFO] k.m.a.KafkaManagerActor - Updating internal state...
2022-04-06 05:35:09,244 - [INFO] p.c.s.AkkaHttpServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000

Also Read

Redis Sentinel vs Cluster – Which is Better? (Pros and Cons)

Access Kafka Cluster Manager

At this point, CMAK is started and listening on port 9000. You can now access the Kafka Cluster Manager using the URL http://your-server-ip:9000. You should see the following page:

Click on the Cluster => Add Cluster to add the cluster. You should see the following page:

Provide your cluster information and click on the Save button. You should see the following page:

Click on the Go to cluster view. You should see the cluster information page:

Also Read

Cassandra vs. MongoDB vs. Redis vs. MySQL vs. PostgreSQL (Pros and Cons)

How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker) Conclusion

In this post, we explained how to install Apache Kafka on CentOS 8 Stream. We also showed you how to create Kafka and Zookeeper Service File with CMAK Kafka cluster manager on your server. You can now manage your Apache kafka via web browser.

How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker)

What is Apache Kafka

Kafka usecases

API's

Producer API

Consumer API

Stream API

Connector API

Benefits Of Apache Kafka

How to Install Apache Kafka on CentOS Stream 8

Install Java JDK

Install Apache Kafka

Create a Kafka and Zookeeper Service File

Install Apache Kafka Cluster Manager

Access Kafka Cluster Manager

How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker) Conclusion

Related Posts:

Hitesh Jethva

Recent Posts

Pages

Follow Us