Kafka Security Best Practices Checklist to Securing your Server

Kafka Security Best Practices Checklist to Securing your Kafka Server. Apache Kafka is the popular choice for event streaming platform. Securing it is critical for data security and often required by governing bodies. 

 

In this post, we will have a look at Kafka Security Best Practices Checklist that will assist you to secure your event streaming platform.

But before we move on to that, let us first have a look at a few definitions.

What is Kafka

Apache Kafka is an open source distributed event streaming platform developed by the Apache Software Foundation.

Kafka is a data store that has been optimized for ingesting and processing streaming data in real-time. Its users can subscribe to the platform to retrieve data from it, while others may use it to publish to any number of systems or push their data to real time applications.

Streaming data – data that is continuously generated by data sources – is, typically, sent in simultaneously and in an uninterrupted flow. Kafka, therefore, needs to handle the constant influx of data and process it sequentially, incrementally and promptly.

Advantages of Kafka

The main Apache Kafka pros are:

    • Processes streams of records in real time.
    • Distributes the records in their correct order to subscribers that are pulling from it.

The platform does this efficiently enough that the data is streamed in real time. And this makes it one of the most popular platforms of its kind.

Apache Kafka market share

Below is the market share use of Kafka. Looking at Apache Kafka customers by industry Computer Software and Information Technology are the largest segments that use Kafka.

Kafka features

What does an unsecured Kafka deployment mean

An unsecured Kafka deployment – we say “deployment” because, as we shall soon see, the platform’s architecture consists of multiple components – can lead to harm being done to data, hardware, as well as subscribers.

Hackers can sniff data packets, snoop on confidential information in them and even alter them to deliver fake information or malicious content.

The Kafka deployment itself could be brought down.

All we need to do is have a look at the Apache Kafka Security Vulnerabilities that are listed on the Common Vulnerabilities and Exposures (VCE) page. We soon realize from the regularly updated page that there are serious vulnerabilities to look out for – and also remember that the list continues to grow.

What components make up a Kafka deployment

The Kafka platform consists of eight main components, and they are:

Records

Kafka stores records and serves them to consumers. The records are grouped into topics and retained even after the consumer requests are complete and done with.

These records, though, aren’t stored in a single file. They are instead broken up into “partitions” of a topic that are sequenced to allow for retrieval in their correct order.

Topics

Topics are record categories or containers that are assigned to data as they are being published or stored. When data is being written, they go into topics. And it is the topics that are accessed when consumers subscribe to read from Kafka.

Logs

Kafka logs are a collection of data segments. Each log provides a logical representation of a unique topic based partitioning.

Logs are also a collection of data that is present across various data points, regardless of whether the processing or transmission has taken place or not. There are also data logs that represent various errors and warnings that can be used for troubleshooting and debugging.

More importantly, a log also acts as a buffer that keeps data consumption rate from depending on data production rate – something that becomes very important when multiple subscribers are trying to access the same topic at different rates.

Broker

A broker lets consumers fetch messages by topic and partition. It creates a cluster by sharing information between each other directly – or indirectly via ZooKeeper, another Apache platform that integrates with Kafka to enable optimal coordination between its distributed components.

Incidentally, a Kafka cluster can only have one broker as its controller.

Clusters

A Kafka cluster, meanwhile, is a system that consists of a broker, topics, and partitions. Its main purpose is to distribute workloads equally among partitions.

A Kafka cluster can have multiple brokers inside to act as load balancers. A single Kafka server is known as a Kafka broker. And a Kafka cluster is stateless, which means it needs Zookeeper to maintain its cluster state.

Partitions

A partition is a subset of a topic – and topics, which are logical concepts, are in turn divided into multiple partitions. Thus, it can be said that a partition is the smallest storage unit that holds a subset of records owned by a topic.

A partition is a single log file where records are appended at the end. This helps with the splitting of new messages among nodes in a cluster (when writing), and the processing of old ones (when reading).

Producer

A source application for a data stream that publishes or writes events is known as a producer. The Apache Kafka Producer is used to generate tokens or messages and publish them to one or more topics in a cluster.

Consumer

A consumer is a client that reads or processes records from a cluster. A consumer group, on the other hand, is a set of consumers that cooperate to consume data from some topics.

Consumers like client applications subscribe to read and process published events using the KafkaConsumer class – which allows them to receive messages from their topics of choice.

Kafka Security Best Practices Checklist

Secure Kafka Server Deployment

Ok; now that we have had a look at all the moving parts that make up a deployment, let us move on to the checklist of best practices to securing a Kafka server deployment:

1. Encryption

In Kafka, data is by default stored in plaintext format which makes it a vulnerability right from the onset. But, the risks increase even more when this data is being transported across networks.

The best way to secure data on Kafka is with the help of the communication security, in the Physical Layer Security like TLS or SSL – to encrypt it while it is both at rest and in motion.

Some other measures to take include:

    • Generating certificates for brokers and then signing them.

The biggest concern for administrators here may be that the encryption and decryption process eats into resources and, thus, contributes to deteriorating performance. But, this isn’t an issue when it comes to Kafka as it is negligible as long as the deployment is optimized.

2. Secret Protection

Implementing encryption and other security features in Kafka requires the configuration of secret values like passwords, keys, accounts and hostnames.

Kafka does not provide any method of protecting these values and users often resort to storing them in clear texted configuration files and source control systems –which is a big mistake.

Commercial third party solutions can be added to encrypt secrets within configuration files and hide log files.

3. Firewalls

Brokers should be installed in private networks. They should be behind port-based and web access firewalls which are important for isolating both Kafka and ZooKeeper.

Administrators need to configure the port-based firewalls to limit access to specific port numbers while web access firewalls can be used to limit access to specific, limited groups of possible requests.

4. Authentication

All clients need to be authenticated. Applications that want to consume data must have client authentication enabled. On the Kafka side, administrators can define which users and applications are allowed to connect to the cluster.

There are three main areas to focus on when implementing authentication are:

    • Kafka brokers.
    • ZooKeeper servers.
    • HTTP-based services like ksqlDB servers, REST proxies, and control centers.

5. Access Control

When administrators grant access to users and applications, they need to set parameters for what data – or information – they are allowed to access in Kafka.

Access Control Lists (ACLs) are used to limit which clients and applications can read from, and/or write to, which topic.

Kafka has an out of the box feature that can be configured to implement ACL configurations. These native ACLs are stored in ZooKeeper.

6. Authorization

Even when a user or application has been authenticated and granted access, there still needs to be control over what actions they are authorized to perform and on which topic, for example.

Client authorization must be enabled before any attempts at reading or writing are made. Simple examples could be defining which applications can read from a topic or restricting write access to other topics to prevent data pollution or fraudulent activities.

7. Protect ZooKeeper

A running Apache ZooKeeper cluster is an integral part of a Kafka deployment. In fact, ZooKeeper must be up and running in a cluster before Kafka is even installed.

Once installed, ZooKeeper needs to be kept secure – it shouldn’t face the Internet, and security features like ACLs need to be configured to protect it.

Also, the maximum number of nodes needs to be limited to four: one for the development environment, and three nodes for the production cluster.

Other measures to be taken include:

    • Provide ZooKeeper with as much bandwidth as possible and the best hardware available.
    • Separate ZooKeeper and Kafka – use a server for each installation and store logs separately.
    • Dedicating the ZooKeeper installation to Kafka and Kafka only and isolating the ZooKeeper process.

Great effort. We have learned Kafka Security Best Practices Checklist to Securing your Kafka Server.

Kafka Security Best Practices Checklist to Securing your Kafka Server

As we have just seen in the Kafka Security Best Practices Checklist, securing your Kafka server deployment could be a daunting task. But security is a continuous function and this article provides a framework for you to begin securing your event streaming platform.

Avatar for Liku Zelleke
Liku Zelleke

Liku Zelleke is a technology blogger who has over two decades experience in the IT industry. He hasn’t looked back since the day, years ago, when he discovered he could combine that experience with his other passion: writing. Today, he writes on topics related to network configuration, optimization, and security for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x