How to Install Apache Cassandra on Ubuntu 20.04. Services on the Internet deal with huge amounts of data, where most of the data will be stored distributed on many different servers to ensure user access. Hence the importance of Relational Database Management Systems (RDMS) but they proved to be no longer suitable for services like this. So the concept of DBMS (Database Management System) suitable to manage these distributed volumes of data and that is NoSQL.
NoSQL database management system that is popularly known as “Apache Cassandra“. In this post, you will be learning a few fundamental concepts, such as Introduction to the Apache Cassandra, its main features and installation steps on Ubuntu 20.04.
What is Apache Cassandra?
Apache Cassandra is a free to use, open source, distributed and wide column datastore and the First ever No SQL database management system. This type of database management system is specially designed to handle a huge amount of data sets within the organizations. Some of the applications where a large amount of data set is needed are required, such as commodity servers, telecom and data streaming applications without any errors. It is a scalable and user friendly interface database system developed by Apache software corporations.
Apache Cassandra is a combination of the Google big table and Amazon Dynamo.
It was created to power the “inbox search” features.
Facebook was open sourced in July 2008.
Apache incubator was accepted in March 2009.
Cassandra is a top level project of Apache since February 2010.
The main purpose of using Apache Cassandra is it enables organizations to process large volumes of data sets in the most reliable and scalable way. This is the main reason why big social media companies like Facebook, Instagram and Twitter mainly use this database management system.
Apache Cassandra data model
The data model in Apache Cassandra can be described as follows:
The data sets in the apache Cassandra can be stored as a set of rows that are organized into the tables.
In the Apache, Cassandra tables are known as column families.
Each row in the Apache Cassandra is identified by a primary key value.
Data can be differentiated with the help of primary key values.
So it is possible to get the entire data or some data based on the primary key.
Here we have listed a few top pros or features of Apache Cassandra, which makes it so popular and reliable database management system.
Highly proven NoSQL database management system: Cassandra is in use at constant contact, CERN, Comcast, eBay, Github, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, the weather channel and over 1500 more companies that have huge and active data sets.
Fault tolerant: Data is automatically replicated to multiple nodes for fault tolerance. Replication across the multiple data centers is supported. Failed nodes can be replaced with no downtime.
Performant: Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.
Decentralized: There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
Scalable: Some of the largest production deployments include Apple, with over 75,000 nodes storing over 10 PB of data, Netflix, Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), and eBay (over 100 nodes, 250 TB).
Durable: Cassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.
Elastic: Read and write through both increase linearly as a new machine is added with no downtime or interruption to applications.
Fully controlled system: Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with hinted hands off and read repair features.
Professionally supported: Cassandra supports contracts and services are available from third parties.
Before we start, we need to install Java version 8 to work seamlessly with Apache Cassandra. By default, Java 8 is included in the Ubuntu 20.04 default repository. You can install it by just running the following command:
apt-get install openjdk-8-jdk -y
Once Java is installed, you can verify the installed version of Java with the following command:
java -version
You will get the Java version in the following output:
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
At this point, Java is installed on your system. You can now proceed to the next step.
When installing Apache Cassandra on Ubuntu 20.04 we will need to install some dependencies to the system. We can install them by running the following command:
Next in our guide how to install Apache Cassandra on Ubuntu 20.04 is that after installing Apache Cassandra, its service starts automatically. We can check the status of Apache Cassandra using the following command:
systemctl status cassandra
This following output appears:
● cassandra.service - LSB: distributed storage system for structured data
Loaded: loaded (/etc/init.d/cassandra; generated)
Active: active (running) since Fri 2022-02-18 02:52:00 UTC; 25s ago
Docs: man:systemd-sysv-generator(8)
Tasks: 53 (limit: 2348)
Memory: 1.2G
CGroup: /system.slice/cassandra.service
└─6499 /usr/bin/java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnO>
Feb 18 02:52:00 ubuntu2004 systemd[1]: Starting LSB: distributed storage system for structured data...
Feb 18 02:52:00 ubuntu2004 systemd[1]: Started LSB: distributed storage system for structured data.
We can also use the node tool command-line tool to check the Apache Cassandra status:
How to Install Apache Cassandra on Ubuntu 20.04 Conclusion
Congratulations! you have successfully installed Apache Cassandra on Ubuntu 20.04. Now, go to the Apache Cassandra official documentation page and learn how to use Apache Cassandra.
If you need database management system that offers excellent reliability even during frequent scaling and ease of setup and maintenance choosing Apache Cassandra is the right choice. It is most beneficial if your business or organization growing rapidly or you need to work with transactional data.
I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.
00votes
Article Rating
Subscribe
Login and comment with
I allow to create an account
When you login first time using a Social Login button, we collect your account public profile information shared by Social Login provider, based on your privacy settings. We also get your email address to automatically create an account for you in our website. Once your account is created, you'll be logged-in to this account.
DisagreeAgree
Login and comment with
I allow to create an account
When you login first time using a Social Login button, we collect your account public profile information shared by Social Login provider, based on your privacy settings. We also get your email address to automatically create an account for you in our website. Once your account is created, you'll be logged-in to this account.