How to Install Apache Cassandra on CentOS 8 Tutorial (Step by Step)

How to Install Apache Cassandra on CentOS 8. In this article we will introduce what Apache Cassandra is with its pros and features and next we will move onto installation guide. let’s get started. 

What is Cassandra

Apache Cassandra is an open source distributed database developed by Apache Software Foundation, allowing users to store and maintain large data volumes across different data centers.  The column oriented database has a peer to peer architecture, is highly consistent, fault tolerant and is scalable. Written in Java language and it is one of the efficient NoSQL databases with advanced features.

The purpose of designing the distributed database was to help companies handle big data workloads across multiple servers and data centers without failure. The Cassandra database provides high availability and enables the deployment of multi node Cassandra clusters without a single point of failure to meet the demands. Each node in Cassandra is independent, interconnected, and plays the same role.

Regardless of your data located in the cluster, the database nodes can accept read and write requests. As a result, if any node goes down due to a technical issue, the other node can serve the read/write requests in the network. Facebook, Rackspace, eBay, Twitter, Cisco, Adobe, Netflix, etc., are a few high profile companies that use Apache Cassandra.

Apache Cassandra Features

Cassandra is one of the popular distributed databases available because of its technical features. Here are some of the features of Cassandra that make it an attractive option for enterprises:

Fast Writes: Cassandra is compatible with cheap commodity hardware and can run or handle large data volumes. It can write faster than other databases and store hundreds of terabytes of information without creating any impact on the read efficiency.

Fault Tolerant: In case any node goes down in Cassandra, the other can take its place as each node is equal, carries the same data, and plays the same role. Thus, it is fault tolerant. Also, you can add extra nodes to the cluster as per the need, which ensures less chance of affecting the performance.

High Scalability: The design allows users to easily add extra nodes to the Cassandra cluster at any given time as the demand or need grows. Cassandra grows horizontally rather than going vertical. With Cassandra, you can extend or scale across many geographical sites and add more data or consumers as needed.

Supports a Wide Range of Data Structures: Cassandra allows users to store structured, semi structured, and unstructured data. The open source distributed database supports all kinds of data structures and their dynamic changes to reflect the changing needs and demands.

Quick Response Time: Cassandra is linearly scalable and allows users to increase the count of nodes in the cluster. Users can add extra nodes in a linear fashion without thinking much about the complexities. As a result, you can increase the throughput and maintain a quick response time. Thus, Cassandra offers fast linear scale performance.

Transaction Support: ACID stands for Atomicity, Consistency, Isolation, and Durability. Cassandra supports the properties of ACID transactions as these are supported by relational databases.

Easy Data Distribution: The column oriented database allows distribution of data in a seamless manner. Data distribution and replication perform together in Cassandra. Data distribution in Cassandra is a quick and simple process because it provides the flexibility to transfer information by replicating data across different data centers and commodity servers.

High Reliability: All the nodes in the cluster are interconnected. As a result, Cassandra ensures it has no single node failure and performance doesn’t get affected in any way. The design was built in a manner that it could manage the failure of nodes, a vital feature for mission critical applications.

Follow the post below to learn how to install Apache Cassandra on CentOS 8.

Table of Contents

How to Install Apache Cassandra on CentOS 8

Install Java

Apache Cassandra is based on Java and supports only Java version 8. So you will need to install Java 8 on your server. You can install it by running the following command:

				
					dnf install java-1.8.0-openjdk-devel -y
				
			

Once Java is installed, verify the Java version with the following command:

				
					java -version
				
			

You will get the Java version in the following output:

				
					openjdk version "1.8.0_322"
OpenJDK Runtime Environment (build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (build 25.322-b06, mixed mode)
				
			

Install Apache Cassandra

By default, Apache Cassandra is not included in the CentOS 8 default repo. So you will need to create a Cassandra repo on your system. You can create it with the following command:

				
					nano /etc/yum.repos.d/cassandra.repo
				
			

Add the following lines:

				
					[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS
				
			

Save and close the file then install the Apache Cassandra with the following command:

				
					dnf install cassandra -y
				
			

Once the Apache Cassandra is installed, You can proceed to the next step.

Create a Systemd Service File for Apache Cassandra

Next step in this guide how to Install Apache Cassandra on CentOS 8 is to create a systemd service file to manage the Apache Cassandra service. You can create it with the following command:

				
					nano /etc/systemd/system/cassandra.service
				
			

Add the following lines:

				
					[Unit]
Description=Apache Cassandra
After=network.target

[Service]
PIDFile=/var/run/cassandra/cassandra.pid
User=cassandra
Group=cassandra
ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/cassandra.pid
Restart=always

[Install]
WantedBy=multi-user.target

				
			

Save and close the file, then reload the systemd daemon with the following command:

				
					systemctl daemon-reload
				
			

Next, start the Cassandra service and enable it to start at system reboot:

				
					systemctl start cassandra
systemctl enable cassandra
				
			

You can now check the status of the Apache Cassandra with the following command:

				
					systemctl status cassandra
				
			

You will get the following output:

				
					● cassandra.service - Apache Cassandra
   Loaded: loaded (/etc/systemd/system/cassandra.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-02-24 14:08:04 UTC; 4s ago
 Main PID: 5629 (java)
    Tasks: 27 (limit: 11412)
   Memory: 1.0G
   CGroup: /system.slice/cassandra.service
           └─5629 /usr/bin/java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOut>

Feb 24 14:08:07 centos8 cassandra[5629]: INFO  [main] 2022-02-24 14:08:07,045 CassandraDaemon.java:505 - Classpath: /etc/cassandra/conf:/usr/>
Feb 24 14:08:07 centos8 cassandra[5629]: INFO  [main] 2022-02-24 14:08:07,045 CassandraDaemon.java:507 - JVM Arguments: [-Xloggc:/var/log/cas>
Feb 24 14:08:07 centos8 cassandra[5629]: WARN  [main] 2022-02-24 14:08:07,149 NativeLibrary.java:189 - Unable to lock JVM memory (ENOMEM). Th>
Feb 24 14:08:07 centos8 cassandra[5629]: WARN  [main] 2022-02-24 14:08:07,150 StartupChecks.java:136 - jemalloc shared library could not be p>
Feb 24 14:08:07 centos8 cassandra[5629]: WARN  [main] 2022-02-24 14:08:07,150 StartupChecks.java:169 - JMX is not enabled to receive remote c>
Feb 24 14:08:07 centos8 cassandra[5629]: INFO  [main] 2022-02-24 14:08:07,151 SigarLibrary.java:44 - Initializing SIGAR library
Feb 24 14:08:07 centos8 cassandra[5629]: WARN  [main] 2022-02-24 14:08:07,170 SigarLibrary.java:174 - Cassandra server running in degraded mo>
Feb 24 14:08:07 centos8 cassandra[5629]: WARN  [main] 2022-02-24 14:08:07,171 StartupChecks.java:311 - Maximum number of memory map areas per>
Feb 24 14:08:07 centos8 cassandra[5629]: INFO  [main] 2022-02-24 14:08:07,263 QueryProcessor.java:121 - Initialized prepared statement caches>
Feb 24 14:08:07 centos8 cassandra[5629]: INFO  [main] 2022-02-24 14:08:07,898 ColumnFamilyStore.java:432 - Initializing system.IndexInfo

				
			

You can also check the Cassandra status using the nodetool.

				
					nodetool status
				
			

If everything is fine, you will get the following output:

				
					Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  70.88 KiB  256          100.0%            ba8c91a0-bf8f-4ae1-973e-e6d178ca624a  rack1

				
			

Change Default Cluster Name

First, you will need to install Python2 to use the Cassandra cqlsh utility. You can install it with the following command:

				
					dnf install python2
alternatives --set python /usr/bin/python2
				
			

Next, connect to the Cassandra using the cqlsh utility:

				
					cqlsh
				
			

Once you are connected, you will get the following shell:

				
					Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.12 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

				
			

As you can see, the default cluster name is set to Test Cluster. To change the default cluster name, run the following command:

				
					cqlsh> UPDATE system.local SET cluster_name = 'New Cluster' WHERE KEY = 'local';
				
			

Next, exit from the Cassandra shell with the following command:

				
					cqlsh> exit
				
			

Next, you will also need to define the new cluster name in the Cassandra configuration file. You can edit it with the following command:

				
					nano /etc/cassandra/default.conf/cassandra.yaml
				
			

Change the cluster name as shown below:

				
					cluster_name: 'New Cluster'
				
			

Save and close the file, then run the following command to flush the cache:

				
					nodetool flush system
				
			

Next, restart the Apache Cassandra service to apply the changes:

				
					systemctl restart cassandra
				
			

Now, verify the new cluster name with the following command:

				
					cqlsh
				
			

You should see the new cluster name in the following output:

				
					Connected to New Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.12 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
				
			

Great, you have learned how to how to Install Apache Cassandra on CentOS 8!

How to Install Apache Cassandra on CentOS 8 Conclusion

Cassandra is a NoSQL database that can store a lot of data and distribute that data as much as possible. Companies who need to process a lot of data  and do so quickly and reliably will be successful with it.

 

In the above guide, we explained what Apache Cassandra is with the main features and  how to install Apache Cassandra on CentOS 8. We also explained how to change the default cluster name. For more information, visit the Cassandra documentation.

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x