How to Setup Cassandra Cluster on Azure/AWS/GCP
To setup and install an Apache Cassandra Cluster on any of the cloud platforms, either a single node or deploy multiple seed nodes, the recommended way is to setup a Cassandra server using the image from any of the cloud marketplaces below:
Setup Cassandra Cluster on Azure
Â
Deploy Cassandra on Ubuntu Server 20.04
Â
Setup Cassandra Cluster on AWS
Â
Deploy Cassandra on Ubuntu Server 20.04
Â
Setup Cassandra Cluster on GCP
Â
Deploy Cassandra on Ubuntu Server 20.04
Â
Getting Started
Â
Once your Cassandra server has been deployed, the following links explain how to connect to a Linux VM:
Â
- How to connect to a Linux VM on Azure
- How to connect to a Linux VM on AWS
- How to connect to a Linux VM on GCP
Â
Once connected and logged in, you’re ready to start configuring your Apache Cassandra Cluster
Setting up Cassandra Cluster
Â
You may want to change the Cassandra configuration settings depending on your requirements. The default configuration is sufficient if you intend to use Cassandra on a single node. If using Cassandra in a cluster, you can customize the main settings using the cassandra.yaml file.
Â
To edit the cassandra.yaml file, run the following command:
On Ubuntu / Debian Run the following:
sudo nano /etc/cassandra/cassandra.yaml
On CentOS Run the following:
First check if cassandra service has started.Â
sudo systemctl status cassandra
If its active (running), then Cassandra is ready and running. If its not started yet, run the following command to start cassandra service
sudo systemctl start cassandra
Now were ready to edit the cassandra.yaml file. Run the following command to open.
sudo nano /etc/cassandra/default.conf/cassandra.yaml
The key points to edit are:
Â
- cluster_name: Can be anything chosen by you to describe the clusters name. All members of a cluster must have the same name. Note:Â before changing the name, first update the cluster name using the cqlsh tool otherwise you will get connection errors. Steps further down.
- num_tokens: This value represents the number of virtual nodes within a Cassandra instance. It is used to partition the data and to spread it throughout the cluster. A good starting value is 256.
- seeds: These are the IP addresses of the clusters seed servers. Seed nodes are used as known places to obtain cluster information (such as a list of nodes in the cluster). All active nodes have this information, to avoid a single point of failure. They are known locations that can be relied on, to have the information when other machines can come and go. It is recommended to have 3 seed nodes per datacenter.
- listen_address: This is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication. The software will try to guess your machines IP address if you leave it blank, but it’s best to specify it yourself. This information will be specific on each node.
- rpc_address: This is the IP address that Cassandra will listen on for client based communication, such as through the CQL protocol. This information will change on each node.
- endpoint_snitch: Represents the ‘snitch’ used by Cassandra. A snitch tells Cassandra which datacenter and rack a node belongs to within a cluster. There are various types that could be used here, you may refer to the official documentation for more information on this topic.
Renaming Your Cassandra Cluster
Â
One of the first config changes is to change the default cluster name. By default the Cassandra cluster name is ‘Test Cluster‘
We need to use the built in Cassandra tool cqlsh to update the cluster name in the db tables.
Â
Run the following command to check the status of your current cluster:
nodetool status
In the above screenshot we can see we have 1 node, its IP address and its on rack1.
Â
We now need to connect to our cluster in order to update the cluster name. We will need to do this on every node. Instructions on adding more nodes is further down.
Â
Run the following commands to connect to the cluster and update the cluster name:
cqlsh 127.0.0.1
Replace [new_cluster_name] with your new cluster name:
UPDATE system.local SET cluster_name = '[new_cluster_name]' WHERE KEY = 'local';
Exit the CQL shell with the command exit
Â
Run the following commands to flush the system cache and restart the Cassandra service:
sudo systemctl restart cassandra
We now need to edit the cassandra.yaml file.
Â
On Ubuntu/Debian > sudo nano /etc/cassandra/cassandra.yaml
On CentOS > sudo nano /etc/cassandra/default.conf/cassandra.yaml
Â
Find the line that reads cluster_name: The default name is Test Cluster. Change the name and save and exit the file.
Â
Now that you have updated the cassandra.yaml on each node with your new cluster name and also updated the cluster name in the db tables with the cluster name, we now need to flush the system cache and restart the cassandra service with the following commands.
sudo systemctl restart cassandra
Log into the cluster with cqlsh and verify the new cluster name is visible.
Â
Add more Cassandra Nodes to Cluster
Â
Using the image from the marketplace that you used to deploy Cassandra, deploy more servers that you want to use as additional nodes. Once deployed open the cassandra.yaml on each node and add the IP addresses of each node. Also update the new cluster name on each node.
Â
Open the configuration file and under the seed _provider section, find the seeds entry:
Â
Â
Add the IP address of every node in your cluster. Divide the entries by using a comma after every address.Â
Cassandra Documentation / Configuration
Â
To modify Cassandra’s default settings, check out the configuration files which are found in the /etc/cassandra directory. Data is stored in /var/lib/cassandra path. Start-up options can be tweaked in the /etc/default/cassandra file.
Â
Command to start cassandra service:
sudo systemctl start cassandra
Â
Command to restart cassandra service:
sudo systemctl restart cassandra
Â
Command to stop cassandra service:
sudo systemctl stop cassandra
Â
Command to verify cassandra service status
sudo systemctl -l status cassandra
Â
Official Apache Cassandra Documentation can be found on:
Cassandra Firewall Ports
Â
The following ports must be open to allow bi-directional communication between nodes, including certain Cassandra ports. Configure the firewall running on nodes in your Cassandra cluster accordingly. Without open ports as shown, nodes will act as a standalone database server and will not join the Cassandra cluster.
Â
- TCP 7000 – Cassandra inter-node cluster communication.
- TCP 7001 – Cassandra SSL inter-node cluster communication.
- TCP 7199 – Cassandra JMX monitoring port.
- TCP 9042 – Cassandra client port.
- TCP 9160 – Cassandra client port (Thrift).
- TCP 9142 – Default for native_transport_port_ssl, useful when both encrypted and unencrypted connections are required
Â
If you are using any of the cloud security groups and need to change / add ports refer to the following guides:
Â
To setup AWS firewall rules refer to – AWS Security Groups
To setup Azure firewall rules refer to – Azure Network Security Groups
To setup Google GCP firewall rules refer to – Creating GCP Firewalls
Related Posts:
- How to Setup Cassandra Cluster on Ubuntu 20.04 (Step by Step)
- How to Install Apache Cassandra Cluster on Ubuntu 22.04
- How to Setup Cassandra Docker Container using Docker Compose
- How to Setup Redis Cluster on Ubuntu 20.04 (Step by Step)
- How to Setup Hadoop Cluster (Multi Node on Ubuntu)
- Cassandra vs DynamoDB - What's the Difference (Pros and Cons)