How to Setup Cassandra Cluster on Ubuntu 20.04. Apache Cassandra is an open-source and non relational database that delivers high performance and continuous availability and no single point of failure.
What is the Apache Cassandra Cluster?
The Apache Cassandra cluster is nothing but a collection of the nodes that defines a system. A cluster in the Apache Cassandra is one of the shells that represents the entire Cassandra database system.
The below image illustrates the complete overview of the Apache Cassandra cluster:
A cluster is basically the outermost shell and acts as a storage unit in the database system. In the Cluster section, there are a lot of different units are used as storage units.
A node is the second layer of the database system. This layer composes various systems, computers or storage units. Each cluster may contain many nodes, systems, or storage units.
A keyspace is the next layer of the storage unit, and there are many key spaces available. Usually, the keyspace contains the main data, and that will be used according to the priorities.
Column families are the next layer in the storage unit. The keyspace is further divided into the column families, and have different headings or area where data gets distributed.
Rows are the next level of the storage unit. The rows are basically considered as a classification under which the family is divided. These classifications will be further distributed as separate entries.
The column is the innermost layer in the database system. The columns are further divided into different headings or titles.
Apache Cassandra cluster is based on the distributed system architecture (Cassandra architecture). In a simple term, the Cassandra cluster can be installed on a single machine or a docker container, and this works well for testing.
The below image illustrates the Apache Cassandra cluster topology:
A single Cassandra cluster is called a node (explained at the beginning). Cassandra normally supports horizontal scalability, which is achieved by adding one or more nodes as a part of the cluster. This type of scalability works well and provides a linear improvement if the resources are configured optimally.
Apache Cassandra cluster is developed based on the peer-to-peer architecture where each node is connected to the other. Each Cassandra cluster performs all the database-related operations and also serves the client requests without using any master node. Nodes in the cluster act as a bridge that establish communication between any two systems or computers. The components used to perform his process are:
Seeds: each node can able to configure the list of seeds which is just a list of one more node. A seed in the cluster is mainly used to bootstrap a node when it is about to join the cluster. A seed does not have any purpose, and this is not a single point of failure (because Cassandra is a fault tolerant system).
Gossip: Gossip is nothing but a protocol used in the Cassandra architecture to achieve peer to peer communication. A gossip informs all the activities of the nodes. The gossip messages follow specific formats and version numbers to establish communication.
A cluster architecture is further divided into racks and data centers. A rack is a group of bare metal servers sharing various sources such as network switches, power supply, etc.
In the next section of this post of how to set up Apache Cassandra Cluster on Ubuntu 20.04 we will mention IP address that Cassandra will listen to. On both servers we will install Cassandra on Ubuntu 20.04.
Hosts
IP Address
server1
192.168.0.100
server2
192.168.0.101
Table of Contents
Install Java
Before starting, you will need to install Java 8 on both servers. Run the following command on both servers to install Java 8.
apt-get install openjdk-8-jdk -y
Once Java is installed, you can verify the installed version of Java with the following command:
java -version
You will get the Java version in the following output:
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
Next, add the Cassandra repository to APT using the following command:
sh -c 'echo "deb http://www.apache.org/dist/cassandra/debian 311x main" > /etc/apt/sources.list.d/cassandra.list'
Once the repository is added, update the repository and install the Apache Cassandra with the following command:
apt-get update -y
apt-get install cassandra -y
Once the installation is complete, check the status of the Apache Cassandra with the following command:
systemctl status cassandra
You will get the following output:
● cassandra.service - LSB: distributed storage system for structured data
Loaded: loaded (/etc/init.d/cassandra; generated)
Active: active (exited) since Fri 2022-02-18 03:11:41 UTC; 6s ago
Docs: man:systemd-sysv-generator(8)
Process: 7181 ExecStart=/etc/init.d/cassandra start (code=exited, status=0/SUCCESS)
Feb 18 03:11:41 ubuntu2004 systemd[1]: Starting LSB: distributed storage system for structured data...
Feb 18 03:11:41 ubuntu2004 systemd[1]: Started LSB: distributed storage system for structured data.
Cassandra provides a nodetool command-line utility for managing and monitoring the Cassandra cluster. It allows you to check the Cassandra cluster status.
Run the following command on the first server to check the Cassandra cluster status:
How to Setup Cassandra Cluster on Ubuntu 20.04 (Step by Step) Conclusion
In this guide, we explained how to set up a two node Apache Cassandra cluster on Ubuntu 20.04 server. You can now add more servers to the Apache Cassandra cluster and scale your deployments.
I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.