How to Install Apache Cassandra on Ubuntu 20.04 (Tutorial)

How to Install Apache Cassandra on Ubuntu 20.04. Services on the Internet deal with huge amounts of data, where most of the data will be stored distributed on many different servers to ensure user access. Hence the importance of Relational Database Management Systems (RDMS) but they proved to be no longer suitable for services like this. So the concept of  DBMS (Database Management System) suitable to manage these distributed volumes of data and that is NoSQL.

 

NoSQL database management system that is popularly known as “Apache Cassandra“. In this post, you will be learning a few fundamental concepts, such as Introduction to the Apache Cassandra, its main features and installation steps on Ubuntu 20.04. 

What is Apache Cassandra?

Apache Cassandra is a free to use, open source, distributed and wide column datastore and the First ever No SQL database management system. This type of database management system is specially designed to handle a huge amount of data sets within the organizations. Some of the applications where a large amount of data set is needed are required, such as commodity servers, telecom and data streaming applications without any errors. It is a scalable and user friendly interface database system developed by Apache software corporations.

Apache Cassandra history

  • Cassandra was first developed by Facebook. inc.
  • Apache Cassandra is a combination of the Google big table and Amazon Dynamo.
  • It was created to power the “inbox search” features.
  • Facebook was open sourced in July 2008.
  • Apache incubator was accepted in March 2009.
  • Cassandra is a top level project of Apache since February 2010.

The main purpose of using Apache Cassandra is it enables organizations to process large volumes of data sets in the most reliable and scalable way. This is the main reason why big social media companies like Facebook, Instagram and Twitter mainly use this database management system.

Apache Cassandra data model

The data model in Apache Cassandra can be described as follows:

  • The data sets in the apache Cassandra can be stored as a set of rows that are organized into the tables.
  • In the Apache, Cassandra tables are known as column families.
  • Each row in the Apache Cassandra is identified by a primary key value.
  • Data can be differentiated with the help of primary key values.
  • So it is possible to get the entire data or some data based on the primary key.

Features of Apache Cassandra

Here we have listed a few top pros or features of Apache Cassandra, which makes it so popular and reliable database management system.

Highly proven NoSQL database management system: Cassandra is in use at constant contact, CERN, Comcast, eBay, Github, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, the weather channel and over 1500 more companies that have huge and active data sets.

Fault tolerant: Data is automatically replicated to multiple nodes for fault tolerance. Replication across the multiple data centers is supported. Failed nodes can be replaced with no downtime.

Performant: Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.

Decentralized: There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.

Scalable: Some of the largest production deployments include Apple, with over 75,000 nodes storing over 10 PB of data, Netflix, Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), and eBay (over 100 nodes, 250 TB).

Durable: Cassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.

Elastic: Read and write through both increase linearly as a new machine is added with no downtime or interruption to applications.

Fully controlled system: Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with hinted hands off and read repair features.

Professionally supported: Cassandra supports contracts and services are available from third parties.

Install Java OpenJDK

Before we start, we need to install Java version 8 to work seamlessly with Apache Cassandra. By default, Java 8 is included in the Ubuntu 20.04 default repository. You can install it by just running the following command:

				
					apt-get install openjdk-8-jdk -y
				
			

Once Java is installed, you can verify the installed version of Java with the following command:

				
					java -version
				
			

You will get the Java version in the following output:

				
					openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
				
			

At this point, Java is installed on your system. You can now proceed to the next step.

Install Apache Cassandra on Ubuntu

When installing Apache Cassandra on Ubuntu 20.04 we will need to install some dependencies to the system. We can install them by running the following command:

				
					apt-get install gnupg2 wget curl unzip apt-transport-https -y
				
			

Once all the dependencies are installed, add the Cassandra GPG key with the following command:

				
					wget -q -O - https://www.apache.org/dist/cassandra/KEYS | apt-key add -
				
			

Next, add the Cassandra repository to APT using the following command:

				
					sh -c 'echo "deb http://www.apache.org/dist/cassandra/debian 311x main" > /etc/apt/sources.list.d/cassandra.list'
				
			

Once the repository is added, update the repository and install the Apache Cassandra with the following command:

				
					apt-get update -y
apt-get install cassandra -y
				
			

Once Apache Cassandra is installed, you can proceed to the next step.

Verify Apache Cassandra Installation

Next in our guide how to install Apache Cassandra on Ubuntu 20.04 is that after installing Apache Cassandra, its service starts automatically. We can check the status of Apache Cassandra using the following command:

				
					systemctl status cassandra
				
			

This following output appears:

				
					● cassandra.service - LSB: distributed storage system for structured data
     Loaded: loaded (/etc/init.d/cassandra; generated)
     Active: active (running) since Fri 2022-02-18 02:52:00 UTC; 25s ago
       Docs: man:systemd-sysv-generator(8)
      Tasks: 53 (limit: 2348)
     Memory: 1.2G
     CGroup: /system.slice/cassandra.service
             └─6499 /usr/bin/java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnO>

Feb 18 02:52:00 ubuntu2004 systemd[1]: Starting LSB: distributed storage system for structured data...
Feb 18 02:52:00 ubuntu2004 systemd[1]: Started LSB: distributed storage system for structured data.

				
			

We can also use the node tool command-line tool to check the Apache Cassandra status:

				
					nodetool status
				
			

You will get the following output:

				
					Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  70.86 KiB  256          100.0%            5cb85aad-68e9-4521-9464-a9b0899b2c76  rack1

				
			

At this point, Cassandra is installed and running. You can now proceed to the next step.

Rename Apache Cassandra Cluster

Apache Cassandra provides a cqlsh command-line utility to interact with Cassandra via CQL. Run the following command to connect to the Cassandra:

				
					cqlsh
				
			

Once you are connected, you should see the following output:

				
					Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.12 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> 

				
			

By default, the Cassandra cluster name is set to “Test Cluster”, you can change it via cqlsh utility.

First login to Cassandra with the following command:

				
					cqlsh
				
			

Once you are logged in, change the Cassandra cluster name to “New Cluster” using the following command:

				
					UPDATE system.local SET cluster_name = 'New Cluster' WHERE KEY = 'local';
				
			

Next, run the following command to exit from the Cassandra shell:

				
					exit
				
			

Next, you will also need to edit the Cassandra configuration file and define your new cluster name.

				
					nano /etc/cassandra/cassandra.yaml
				
			

Change the following line:

				
					cluster_name: 'New Cluster'
				
			

Save and close the file, then flush the system cache with the following command:

				
					nodetool flush system
				
			

Next, restart the Cassandra service to apply the changes:

				
					systemctl restart cassandra
				
			

How to Install Apache Cassandra on Ubuntu 20.04 Conclusion

Congratulations! you have successfully installed Apache Cassandra on Ubuntu 20.04. Now, go to the Apache Cassandra official documentation page and learn how to use Apache Cassandra.

 

If you need database management system that offers excellent reliability even during frequent scaling and ease of setup and maintenance choosing Apache Cassandra is the right choice. It is most beneficial if your business or organization growing rapidly or you need to work with  transactional data.

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x