Kafka Security Best Practices Checklist to Securing your Kafka Server. Apache Kafka is the popular choice for event streaming platform. Securing it is critical for data security and often required by governing bodies.
In this post, we will have a look at Kafka Security Best Practices Checklist that will assist you to secure your event streaming platform.
But before we move on to that, let us first have a look at a few definitions.
What is Kafka
Kafka is a data store that has been optimized for ingesting and processing streaming data in real-time. Its users can subscribe to the platform to retrieve data from it, while others may use it to publish to any number of systems or push their data to real time applications.
Streaming data – data that is continuously generated by data sources – is, typically, sent in simultaneously and in an uninterrupted flow. Kafka, therefore, needs to handle the constant influx of data and process it sequentially, incrementally and promptly.
Advantages of Kafka
The main Apache Kafka pros are:
- Processes streams of records in real time.
The platform does this efficiently enough that the data is streamed in real time. And this makes it one of the most popular platforms of its kind.
Apache Kafka market share
Below is the market share use of Kafka. Looking at Apache Kafka customers by industry Computer Software and Information Technology are the largest segments that use Kafka.
What does an unsecured Kafka deployment mean
An unsecured Kafka deployment – we say “deployment” because, as we shall soon see, the platform’s architecture consists of multiple components – can lead to harm being done to data, hardware, as well as subscribers.
Hackers can sniff data packets, snoop on confidential information in them and even alter them to deliver fake information or malicious content.
The Kafka deployment itself could be brought down.
All we need to do is have a look at the Apache Kafka Security Vulnerabilities that are listed on the Common Vulnerabilities and Exposures (VCE) page. We soon realize from the regularly updated page that there are serious vulnerabilities to look out for – and also remember that the list continues to grow.
What components make up a Kafka deployment
These records, though, aren’t stored in a single file. They are instead broken up into “partitions” of a topic that are sequenced to allow for retrieval in their correct order.
Topics are record categories or containers that are assigned to data as they are being published or stored. When data is being written, they go into topics. And it is the topics that are accessed when consumers subscribe to read from Kafka.
Kafka logs are a collection of data segments. Each log provides a logical representation of a unique topic based partitioning.
Logs are also a collection of data that is present across various data points, regardless of whether the processing or transmission has taken place or not. There are also data logs that represent various errors and warnings that can be used for troubleshooting and debugging.
More importantly, a log also acts as a buffer that keeps data consumption rate from depending on data production rate – something that becomes very important when multiple subscribers are trying to access the same topic at different rates.
A partition is a subset of a topic – and topics, which are logical concepts, are in turn divided into multiple partitions. Thus, it can be said that a partition is the smallest storage unit that holds a subset of records owned by a topic.
A partition is a single log file where records are appended at the end. This helps with the splitting of new messages among nodes in a cluster (when writing), and the processing of old ones (when reading).
A source application for a data stream that publishes or writes events is known as a producer. The Apache Kafka Producer is used to generate tokens or messages and publish them to one or more topics in a cluster.
A consumer is a client that reads or processes records from a cluster. A consumer group, on the other hand, is a set of consumers that cooperate to consume data from some topics.
Kafka Security Best Practices Checklist
Secure Kafka Server Deployment
Ok; now that we have had a look at all the moving parts that make up a deployment, let us move on to the checklist of best practices to securing a Kafka server deployment:
In Kafka, data is by default stored in plaintext format which makes it a vulnerability right from the onset. But, the risks increase even more when this data is being transported across networks.
The best way to secure data on Kafka is with the help of the communication security, in the Physical Layer Security like TLS or SSL – to encrypt it while it is both at rest and in motion.
Some other measures to take include:
- Generating certificates for brokers and then signing them.
The biggest concern for administrators here may be that the encryption and decryption process eats into resources and, thus, contributes to deteriorating performance. But, this isn’t an issue when it comes to Kafka as it is negligible as long as the deployment is optimized.
2. Secret Protection
Implementing encryption and other security features in Kafka requires the configuration of secret values like passwords, keys, accounts and hostnames.
Kafka does not provide any method of protecting these values and users often resort to storing them in clear texted configuration files and source control systems –which is a big mistake.
Commercial third party solutions can be added to encrypt secrets within configuration files and hide log files.
Administrators need to configure the port-based firewalls to limit access to specific port numbers while web access firewalls can be used to limit access to specific, limited groups of possible requests.
There are three main areas to focus on when implementing authentication are:
- Kafka brokers.
- ZooKeeper servers.
5. Access Control
Kafka has an out of the box feature that can be configured to implement ACL configurations. These native ACLs are stored in ZooKeeper.
Even when a user or application has been authenticated and granted access, there still needs to be control over what actions they are authorized to perform and on which topic, for example.
Client authorization must be enabled before any attempts at reading or writing are made. Simple examples could be defining which applications can read from a topic or restricting write access to other topics to prevent data pollution or fraudulent activities.
7. Protect ZooKeeper
A running Apache ZooKeeper cluster is an integral part of a Kafka deployment. In fact, ZooKeeper must be up and running in a cluster before Kafka is even installed.
Once installed, ZooKeeper needs to be kept secure – it shouldn’t face the Internet, and security features like ACLs need to be configured to protect it.
Also, the maximum number of nodes needs to be limited to four: one for the development environment, and three nodes for the production cluster.
Other measures to be taken include:
- Provide ZooKeeper with as much bandwidth as possible and the best hardware available.
- Separate ZooKeeper and Kafka – use a server for each installation and store logs separately.
- Dedicating the ZooKeeper installation to Kafka and Kafka only and isolating the ZooKeeper process.
Great effort. We have learned Kafka Security Best Practices Checklist to Securing your Kafka Server.
Kafka Security Best Practices Checklist to Securing your Kafka Server
As we have just seen in the Kafka Security Best Practices Checklist, securing your Kafka server deployment could be a daunting task. But security is a continuous function and this article provides a framework for you to begin securing your event streaming platform.