How to Install Apache Cassandra Cluster on Ubuntu 22.04.This post introduces Apache Cassandra, its working principle and then shows you how to install the Apache Cassandra cluster on Ubuntu 22.04.
First of all, Apache Cassandra is a growingly popular open source choice for businesses that manage big amounts of unstructured data and require high availability and scaling for that data.
Well, Apache Cassandra cluster is a collection of various nodes (servers) that collaborate to store and manage huge amounts of data. The cluster’s nodes each have their memory, storage, and processing capabilities. They all communicate with each other to ensure data availability and consistency.
Based on a partition key, data in a Cassandra cluster is divided and dispersed among several nodes. The cluster manages large amounts of data that cannot be stored on a single node since each node in the cluster holds a fraction of the data.
Cassandra uses a distributed, peer to peer architecture to ensure that data is duplicated over various nodes in the cluster, ensuring that access to the data is still available, even if one node fails or goes down. This replication factor provides data durability and consistency and be configured to meet the application’s needs.
The function of an ApacheCassandra cluster is summarised as follows:
Apache Cassandra is a distributed NoSQL database system to enable expansion horizontally by adding more nodes to the cluster.
Each row in a table is identified by a partition key that is hashed to determine the token value of the node that stores the data in Cassandra. Which then distributes the data throughout the cluster’s nodes using this method.
Cassandra stores multiple copies of the data on all cluster nodes to provide high availability and fault tolerance. All in all, Cassandra writes the requested data to the coordinator node when a written request is made. The coordinator node then works with the other nodes to replicate the requested data to the specified number of nodes.
Then, Cassandra connects with the database using the SQL like CQL (Cassandra Query Language), enabling users to query and manage the data easily.
The consistency level for read and write requests in Cassandra are customised by users, who choose how many replicas must reply to a request before it is declared successful.
Following that, Cassandra automatically rebalances data when new nodes are added or removed, ensuring each node has an equal share of the data.
It automatically promotes one of the replicas to serve as the new primary replica for the data in the event of a node failure, ensuring that the data remains available.
Overall, Apache Cassandra handles massive amounts of data with low latency and high throughput while maintaining high availability and fault tolerance due to this approach to operation.
Since Cassandra is so highly scalable, you may add more hardware to connect more customers and more data as needed.
Rigid construction
For business critical applications that cannot afford a failure, Cassandra is continuously available and has no single point of failure.
High speed linear performance
Moreover, Cassandra scales linearly. Because it makes it easier for you to add more nodes to the cluster, it boosts your throughput. As a result, it keeps a short response time.
Fault Tolerant
Additionally, Cassandra withstands errors. Assume that a cluster contains 4 nodes. Each node in this case has a copy of the same data. The other three nodes acts as needed, if one node isn’t working anymore.
We have now arrived to the main part of the article How to Install Apache Cassandra Cluster on Ubuntu 22.04.
Next, you need to install the Apache Cassandra package on both nodes. By default, the Cassandra package is not available in the Ubuntu default repository. So you need to add it to the APT.
First, add the Cassandra repo using the following command.
echo "deb http://www.apache.org/dist/cassandra/debian 40x main" | tee -a /etc/apt/sources.list.d/cassandra.sources.list
Next, download and add the GPG key using the following command.
wget -q -O - https://www.apache.org/dist/cassandra/KEYS | tee /etc/apt/trusted.gpg.d/cassandra.asc
Finally, update the repository and install Apache Cassandra with the following command.
apt update -y
apt install cassandra -y
After installing Apache Cassandra on both nodes, stop the Cassandra service and remove the cache directory on both nodes.
At this point, the Apache Cassandra cluster is installed and configured. Now go to the first node and start the Cassandra service using the following command.
systemctl start cassandra
Now check the status of Cassandra service using the following command.
systemctl status cassandra
If everything is fine, you should see the following screen
Also, start the Cassandra service on the second node.
systemctl start cassandra
At this point, Cassandra cluster is up and running. Proceed to the next step.
How to Install Apache Cassandra Cluster on Ubuntu 22.04 Conclusion
In this post, we have explained how to configure two node Cassandra clusters on Ubuntu 22.04. Add more nodes to the Cassandra cluster and scale your application as per your requirements.
The special features of Cassandra’s design include scalability, dependability and performance. It takes advantage of the CAP theorem and is built on a distributed system design. Because of Cassandra’s unique design, it requires careful setting and adjustment. To utilise Cassandra effectively, it’s essential to understand its components.
I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.