How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker). In this tutorial we will introduce Apache Kafka with its pros and benefits and move over to installation phase with creating Kafka and Zookeeper Service File with CMAK Kafka cluster manager on your server. Let’s start.
What is Apache Kafka
Apache Kafka is an open source, dispersed streaming platform that allows you to create real time and event driven applications. Apache Kafka is message broker software.
Today, numerous data sources create multiple data records that include events where there is a need to keep a digital record of what is happening. These streams provide opportunities for applications to respond to data and applications in real time. Kafka, written in Scala and Java allow you to build applications that continuously process and consume these streams at a very high speed, along with higher fidelity and accuracy. In addition Kafka server uses the TCP based binary protocol, which is optimized for efficiency and relies on the “message set” abstraction.
Kafka provides three crucial functions to its users:
It allows you to publish and subscribe to the stream of records.
It helps you to store this stream of applications that react to the streams of data.
It processes a stream of records of data pipelines in real time.
A Kafka cluster is highly scalable and fault tolerant, but it also has a much higher throughput unlike ActiveMQ and RabbitMQ.
Therefore, you, as a developer can leverage Kafka via four APIs:
Producer API
Using this API, you can publish a stream to any Kafka topic. By topic, we mean a named log used for storing records in order so that they are relative to one another. Also, once the record is written to a topic, it does not get altered or deleted. Instead, they are stored in the topic for a preconfigured time.
Consumer API
This API allows applications to take subscriptions to one or more topics. They also enable them to ingest and process the stream stored in them.
Stream API
It is built upon producer and consumer APIs and constitutes complex processing capabilities with which an application can perform continuous, front to back stream processing. The entire process is conducted to specifically consume records from one or more topics. This way, you can analyze, aggregate and transform them accordingly.
Connector API
With the help of this API, you can build connectors, reusable producers, or consumers that simplifies and automates the process of integrating data sources into a Kafka cluster.
Apache Kafka provides you with the following pros:
High Throughput – Since Kafka does not have large hardware, it is capable of managing high velocity and high volume data. It can also support the message throughput of multiple messages within seconds.
Fault Tolerant – Kafka has an inherent capability of tolerating fault. This way, it becomes failure resistant to machines or nodes within a cluster.
Scalable – Kafka can be easily scaled out without incurring any additional downtime on the nodes. It also makes the process of handling messages completely transparent and even makes them seamless.
Capable Of being a Message Broker – Kafka is considered an efficient replacement for a more traditional message broker. By message broker, we mean an intermediary program from the formal messaging protocol of the publisher to the formal messaging protocol of the receiver.
Persistent – It provides persistent messages, thereby making it highly durable and reliable.
Capable of Batch Handling – Kafka can also handle use cases based on batches. They can also do the work of the traditional ETL because of its ability to perform persistent messaging.
Consumer Friendly – Using Kafka, you can integrate with several consumers. It is because it behaves or acts distinctively as per the consumer it is integrating with. It can integrate into a variety of languages with the consumers.
High Concurrency – As discussed earlier, Kafka has the ability to handle multiple messages within a second in low latency situations. Moreover, it even permits the reading and writing of messages at high concurrency.
Distributed – Kafka is built on a distributed architecture, making it highly scalable by using capabilities like partitioning and replication.
Follow this post below to install Apache Kafka on CentOS Stream 8.
How to Install Apache Kafka on CentOS Stream 8
Install Java JDK
Apache Kafka is a Java based application. So Java JDK must be installed on your server. If not installed, you can install it by running the following command:
dnf install java-11-openjdk -y
After the Java installation, you can verify the Java version with the following command:
java --version
You should see the following output:
openjdk 11.0.13 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
First, you will need to create a dedicated user for Apache Kafka. You can create it using the following command:
adduser kafka
Next, switch the user to Kafka and download the latest version of Apache Kafka using the following command:
su - kafka
wget https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz
Once the download is completed, extract the downloaded file with the following command:
tar -xvzf kafka_2.13-3.1.0.tgz
Next, rename the extracted directory to kafka using the following command:
mv kafka_2.13-3.1.0 kafka
Following that please exit from the Kafka user with the following command:
exit
Create a Kafka and Zookeeper Service File
It is a good idea to create a systemd service file to manage both Apache Kafka and Zookeeper services. First, create a Zookeeper service with the following command:
It is recommended to install a CMAK tool to manage and monitor Apache kafka from the web browser. CMAK is an open source tool for managing and monitoring Kafka services developed by Yahoo. You can download it using the following command:
At this point, CMAK is started and listening on port 9000. You can now access the Kafka Cluster Manager using the URL http://your-server-ip:9000. You should see the following page:
Click on the Cluster => Add Cluster to add the cluster. You should see the following page:
Provide your cluster information and click on the Save button. You should see the following page:
Click on the Go to cluster view. You should see the cluster information page:
How to Install Apache Kafka on CentOS Stream 8 (Linux Message Broker) Conclusion
In this post, we explained how to install Apache Kafka on CentOS 8 Stream. We also showed you how to create Kafka and Zookeeper Service File with CMAK Kafka cluster manager on your server. You can now manage your Apache kafka via web browser.
I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.