How to Install Apache Cassandra Cluster on Ubuntu 22.04

How to Install Apache Cassandra Cluster on Ubuntu 22.04.This post introduces Apache Cassandra, its working principle and then shows you how to install the Apache Cassandra cluster on Ubuntu 22.04.

 

First of all, Apache Cassandra is a growingly popular open source choice for businesses that manage big amounts of unstructured data and require high availability and scaling for that data.

What is the Apache Cassandra Cluster?

Well, Apache Cassandra cluster is a collection of various nodes (servers) that collaborate to store and manage huge amounts of data. The cluster’s nodes each have their memory, storage, and processing capabilities. They all communicate with each other to ensure data availability and consistency.

Based on a partition key, data in a Cassandra cluster is divided and dispersed among several nodes. The cluster manages large amounts of data that cannot be stored on a single node since each node in the cluster holds a fraction of the data.

Cassandra uses a distributed, peer to peer architecture to ensure that data is duplicated over various nodes in the cluster, ensuring that access to the data is still available, even if one node fails or goes down. This replication factor provides data durability and consistency and be configured to meet the application’s needs.

How Does Cassandra Works?

The function of an Apache Cassandra cluster is summarised as follows:

  • Apache Cassandra is a distributed NoSQL database system to enable expansion horizontally by adding more nodes to the cluster.
  • Each row in a table is identified by a partition key that is hashed to determine the token value of the node that stores the data in Cassandra. Which then distributes the data throughout the cluster’s nodes using this method.
  • Cassandra stores multiple copies of the data on all cluster nodes to provide high availability and fault tolerance. All in all, Cassandra writes the requested data to the coordinator node when a written request is made. The coordinator node then works with the other nodes to replicate the requested data to the specified number of nodes.
  • The consistency level for read and write requests in Cassandra are customised by users, who choose how many replicas must reply to a request before it is declared successful.
  • Following that, Cassandra automatically rebalances data when new nodes are added or removed, ensuring each node has an equal share of the data.
  • It automatically promotes one of the replicas to serve as the new primary replica for the data in the event of a node failure, ensuring that the data remains available.

Overall, Apache Cassandra handles massive amounts of data with low latency and high throughput while maintaining high availability and fault tolerance due to this approach to operation.

Features of Apache Cassandra

Since Cassandra is so highly scalable, you may add more hardware to connect more customers and more data as needed.

Rigid construction

For business critical applications that cannot afford a failure, Cassandra is continuously available and has no single point of failure.

High speed linear performance

Moreover, Cassandra scales linearly. Because it makes it easier for you to add more nodes to the cluster, it boosts your throughput. As a result, it keeps a short response time.

Fault Tolerant

Additionally, Cassandra withstands errors. Assume that a cluster contains 4 nodes. Each node in this case has a copy of the same data. The other three nodes acts as needed, if one node isn’t working anymore.

We have now arrived to the main part of the article How to Install Apache Cassandra Cluster on Ubuntu 22.04. 

How to Install Apache Cassandra Cluster on Ubuntu 22.04

In this section, we show you how to set up two node Apache Cassandra cluster on Ubuntu 22.04 server.

Prerequisites

  • Two servers running Ubuntu 22.04 server.
  • A root user or a user with sudo privileges.

Install Java JDK

Apache Cassandra is a Java based application. So you need to install Java JDK on your server. You install it using the APT command as shown below.

				
					apt install default-jdk -y
				
			

After the installation, verify the Java version using the following command.

				
					java --version
				
			

You should see the Java version in the following output.

				
					openjdk 11.0.17 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu222.04)
OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu222.04, mixed mode, sharing)
				
			

Next, install other required packages with the following command.

				
					apt install wget curl gnupg2 -y
				
			

Once you are done, please proceed to the next step.

Configure UFW Firewall

If the UFW firewall is installed and running then you also need to configure UFW on both servers.

On the First node, add the UFW firewall rules with the following command.

				
					ufw allow from node2-ip-address to node1-ip-address proto tcp port 7000,9042
				
			

Next, go to the second node and add the UFW firewall rules with the following command.

				
					ufw allow from node1-ip-address to node2-ip-address proto tcp port 7000,9042
				
			

Now, you verify the added UFW rules with the following command.

				
					ufw status
				
			

You should see all the rules on the following screen.

Install Apache Cassandra on Both Nodes

Next, you need to install the Apache Cassandra package on both nodes. By default, the Cassandra package is not available in the Ubuntu default repository. So you need to add it to the APT.

First, add the Cassandra repo using the following command.

				
					echo "deb http://www.apache.org/dist/cassandra/debian 40x main" | tee -a /etc/apt/sources.list.d/cassandra.sources.list
				
			

Next, download and add the GPG key using the following command.

				
					wget -q -O - https://www.apache.org/dist/cassandra/KEYS | tee /etc/apt/trusted.gpg.d/cassandra.asc
				
			

Finally, update the repository and install Apache Cassandra with the following command.

				
					apt update -y
apt install cassandra -y
				
			

After installing Apache Cassandra on both nodes, stop the Cassandra service and remove the cache directory on both nodes.

				
					systemctl stop cassandra
rm -rf /var/lib/cassandra/*
				
			

Once you are done, please proceed to the next step.

Configure Apache Cassandra Cluster

To configure Cassandra cluster, you need to edit the Cassandra main configuration file on both nodes and make some necessary changes.

On the first node, edit the Cassandra configuration file.

				
					nano /etc/cassandra/cassandra.yaml
				
			

Modify the following lines:

				
					cluster_name: 'Cassandra Cluster'
- seeds: "node1-ip-address:7000"
listen_address: "node1-ip-address"
rpc_address: "node1-ip-address"
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: false
				
			

Save and close the file when you are done.

On the second node, edit the Cassandra configuration file.

				
					nano /etc/cassandra/cassandra.yaml
				
			

Modify the following lines.

				
					cluster_name: 'Cassandra Cluster'
- seeds: "node1-ip-address:7000"
listen_address: "node2-ip-address"
rpc_address: "node2-ip-address"
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: false

				
			

Save and close the file after you finish.

At this point, the Apache Cassandra cluster is installed and configured. Now go to the first node and start the Cassandra service using the following command.

				
					systemctl start cassandra
				
			

Now check the status of Cassandra service using the following command.

				
					systemctl status cassandra
				
			

If everything is fine, you should see the following screen

Also, start the Cassandra service on the second node.

				
					systemctl start cassandra
				
			

At this point, Cassandra cluster is up and running. Proceed to the next step.

Verify Cassandra Cluster

Now, go to the first node and verify the Cassandra cluster using the following command.

				
					nodetool status
				
			

You should see the both nodes on the following screen.

By default,  Cassandra listens on port 9042. You verify it using the following command:

				
					ss -antpl 
				
			

This shows you the Cassandra listening ports on the following screen.

Now, use the cqlsh command line utility to connect to the Cassandra shell.

				
					cqlsh node1-ip-address 9042
				
			

Once you are connected, you should see the following screen.

Now, verify the cluster information using the following command.

				
					describe cluster
				
			

You should see the cluster information on the following screen.

To exit from the Cassandra shell, run the following command.

				
					exit
				
			

Thank you for reading How to Install Apache Cassandra Cluster on Ubuntu 22.04. We shall conclude this article now.

How to Install Apache Cassandra Cluster on Ubuntu 22.04 Conclusion

In this post, we have explained how to configure two node Cassandra clusters on Ubuntu 22.04. Add more nodes to the Cassandra cluster and scale your application as per your requirements.

The special features of Cassandra’s design include scalability, dependability and performance. It takes advantage of the CAP theorem and is built on a distributed system design. Because of Cassandra’s unique design, it requires careful setting and adjustment. To utilise Cassandra effectively, it’s essential to understand its components.

Do explore more of our Cassandra content by navigating to our blog here

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x