How to Setup Cassandra Cluster on Ubuntu 20.04 (Step by Step)

How to Setup Cassandra Cluster on Ubuntu 20.04. Apache Cassandra is an open-source and non relational database that delivers high performance and continuous availability and no single point of failure.

What is the Apache Cassandra Cluster?

The Apache Cassandra cluster is nothing but a collection of the nodes that defines a system. A cluster in the Apache Cassandra is one of the shells that represents the entire Cassandra database system.

The below image illustrates the complete overview of the Apache Cassandra cluster:

A cluster is basically the outermost shell and acts as a storage unit in the database system. In the Cluster section, there are a lot of different units are used as storage units.

A node is the second layer of the database system. This layer composes various systems, computers or storage units. Each cluster may contain many nodes, systems, or storage units.

A keyspace is the next layer of the storage unit, and there are many key spaces available. Usually, the keyspace contains the main data, and that will be used according to the priorities.

Column families are the next layer in the storage unit. The keyspace is further divided into the column families, and have different headings or area where data gets distributed.

Rows are the next level of the storage unit. The rows are basically considered as a classification under which the family is divided. These classifications will be further distributed as separate entries.

The column is the innermost layer in the database system. The columns are further divided into different headings or titles.

How Does the Apache Cassandra Cluster Work?

Apache Cassandra cluster is based on the distributed system architecture (Cassandra architecture). In a simple term, the Cassandra cluster can be installed on a single machine or a docker container, and this works well for testing.

The below image illustrates the Apache Cassandra cluster topology:

A single Cassandra cluster is called a node (explained at the beginning). Cassandra normally supports horizontal scalability, which is achieved by adding one or more nodes as a part of the cluster. This type of scalability works well and provides a linear improvement if the resources are configured optimally.

Apache Cassandra cluster is developed based on the peer-to-peer architecture where each node is connected to the other. Each Cassandra cluster performs all the database-related operations and also serves the client requests without using any master node. Nodes in the cluster act as a bridge that establish communication between any two systems or computers. The components used to perform his process are:

Seeds: each node can able to configure the list of seeds which is just a list of one more node. A seed in the cluster is mainly used to bootstrap a node when it is about to join the cluster. A seed does not have any purpose, and this is not a single point of failure (because Cassandra is a fault tolerant system).

Gossip: Gossip is nothing but a protocol used in the Cassandra architecture to achieve peer to peer communication. A gossip informs all the activities of the nodes. The gossip messages follow specific formats and version numbers to establish communication.

A cluster architecture is further divided into racks and data centers. A rack is a group of bare metal servers sharing various sources such as network switches, power supply, etc.

Setup Cassandra Cluster on Ubuntu 20.04

In the next section of this post of how to set up Apache Cassandra Cluster on Ubuntu 20.04 we will mention IP address that Cassandra will listen to.  On both servers we will install Cassandra on Ubuntu 20.04.

Hosts IP Address
server1
192.168.0.100
server2
192.168.0.101

Table of Contents

Install Java

Before starting, you will need to install Java 8 on both servers. Run the following command on both servers to install Java 8.

				
					apt-get install openjdk-8-jdk -y
				
			

Once Java is installed, you can verify the installed version of Java with the following command:

				
					java -version
				
			

You will get the Java version in the following output:

				
					openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
				
			

Install Apache Cassandra on both Servers

First, you will need to install some dependencies on your server. You can install them by running the following command:

				
					apt-get install gnupg2 wget curl unzip apt-transport-https -y
				
			

Once all the dependencies are installed, add the Cassandra GPG key with the following command:

				
					wget -q -O - https://www.apache.org/dist/cassandra/KEYS | apt-key add -
				
			

Next, add the Cassandra repository to APT using the following command:

				
					sh -c 'echo "deb http://www.apache.org/dist/cassandra/debian 311x main" > /etc/apt/sources.list.d/cassandra.list'
				
			

Once the repository is added, update the repository and install the Apache Cassandra with the following command:

				
					apt-get update -y
apt-get install cassandra -y
				
			

Once the installation is complete, check the status of the Apache Cassandra with the following command:

				
					systemctl status cassandra
				
			

You will get the following output:

				
					● cassandra.service - LSB: distributed storage system for structured data
     Loaded: loaded (/etc/init.d/cassandra; generated)
     Active: active (exited) since Fri 2022-02-18 03:11:41 UTC; 6s ago
       Docs: man:systemd-sysv-generator(8)
    Process: 7181 ExecStart=/etc/init.d/cassandra start (code=exited, status=0/SUCCESS)

Feb 18 03:11:41 ubuntu2004 systemd[1]: Starting LSB: distributed storage system for structured data...
Feb 18 03:11:41 ubuntu2004 systemd[1]: Started LSB: distributed storage system for structured data.
				
			

Configure Apache Cassandra Cluster

Once the Apache Cassandra is installed on both servers. You will need to edit the Cassandra configuration file and set up the Cassandra cluster.

On the first server, edit the Cassandra configuration file:

				
					nano /etc/cassandra/cassandra.yaml
				
			

Change the following lines:

				
					- seeds: "192.168.0.100"
listen_address: 192.168.0.100
rpc_address: 192.168.0.100
				
			

Save and close the file when you are finished.

On the second server, edit the Cassandra configuration file:

				
					nano /etc/cassandra/cassandra.yaml
				
			

Change the following lines:

				
					- seeds: "192.168.0.100"
listen_address: 192.168.0.101
rpc_address: 192.168.0.101
				
			

Save and close the file, then restart the Cassandra service on both servers with the following command:

				
					systemctl restart cassandra
				
			

Verify Apache Cassandra Cluster Status

Cassandra provides a nodetool command-line utility for managing and monitoring the Cassandra cluster. It allows you to check the Cassandra cluster status.

Run the following command on the first server to check the Cassandra cluster status:

				
					nodetool status
				
			

You will get the following output:

				
					Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.0.100  128.91 KiB  256          100.0%            5cb85aad-68e9-4521-9464-a9b0899b2c76  rack1
UN  192.168.0.101  128.91 KiB  256          100.0%            5cb85aad-68e9-4521-9464-a9b0899b2c76  rack1
				
			

If you want to display detailed information of a specific node, run the following command:

				
					nodetool info
				
			

You will get the following output:

				
					ID                     : 5cb85aad-68e9-4521-9464-a9b0899b2c76
Gossip active          : true
Thrift active          : false
Native Transport active: true
Load                   : 128.91 KiB
Generation No          : 1645156406
Uptime (seconds)       : 95
Heap Memory (MB)       : 79.47 / 982.00
Off Heap Memory (MB)   : 0.00
Data Center            : datacenter1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 16, size 1.31 KiB, capacity 49 MiB, 73 hits, 99 requests, 0.737 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 24 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache            : entries 1, size 64 KiB, capacity 213 MiB, 73 misses, 255 requests, 0.714 recent hit rate, 551.313 microseconds miss latency
Percent Repaired       : 100.0%
Token                  : (invoke with -T/--tokens to see all 256 tokens)

				
			

Connect to Apache Cassandra Cluster

In this section, we will create data in the first server and check whether the data replicates to the second server.

First, use the cqlsh tool to connect to the Cassandra cluster:

				
					cqlsh 192.168.0.100 9042
				
			

Once you are connected, you will get the following shell:

				
					Connected to cluster01 at 192.168.0.100:9042.
[cqlsh 5.0.1 | Cassandra 3.11.12 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
				
			

To check the cluster information, run the following command:

				
					cqlsh> SHOW HOST
				
			

You will get the following output:

				
					Connected to cluster01 at 192.168.0.100:9042.
				
			

Next, create a keyspace and table using the following query:

				
					CREATE KEYSPACE Library
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
				
			
				
					CREATE TABLE Library.book (       
ISBN text, 
copy int, 
title text,  
PRIMARY KEY (ISBN, copy)
);
				
			
				
					CREATE TABLE  Library.patron (      
ssn int PRIMARY KEY,  
checkedOut set 
);

				
			

Next, insert some data into table using the following query:

				
					INSERT INTO  Library.book (ISBN, copy, title) VALUES('1234',1, 'Gujarati');
INSERT INTO  Library.book (ISBN, copy, title) VALUES('1234',2, 'Hindi');
INSERT INTO  Library.book (ISBN, copy, title) VALUES('1234',3, 'English');
				
			

Now, go to the second server, connect to the Cassandra using the cqlsh and run the following query to verify the replicated data:

				
					select * from Library.book;
				
			

If everything is fine, you will get the following output:

				
					 isbn | copy | title
------+------+----------
 1234 |    1 | Gujarati
 1234 |    2 |    Hindi
 1234 |    3 |  English
				
			

How to Setup Cassandra Cluster on Ubuntu 20.04 (Step by Step) Conclusion

In this guide, we explained how to set up a two node Apache Cassandra cluster on Ubuntu 20.04 server. You can now add more servers to the Apache Cassandra cluster and scale your deployments.

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x