How to Setup Cassandra Cluster on Azure/AWS/GCP

To setup and install an Apache Cassandra Cluster on any of the cloud platforms, either a single node or deploy multiple seed nodes, the recommended way is to setup a Cassandra server using the image from any of the cloud marketplaces below:

Getting Started

 

Once your Cassandra server has been deployed, the following links explain how to connect to a Linux VM:

 

 

Once connected and logged in, you’re ready to start configuring your Apache Cassandra Cluster

Setting up Cassandra Cluster

 

You may want to change the Cassandra configuration settings depending on your requirements. The default configuration is sufficient if you intend to use Cassandra on a single node. If using Cassandra in a cluster, you can customize the main settings using the cassandra.yaml file.

 

To edit the cassandra.yaml file, run the following command:

On Ubuntu / Debian Run the following:

sudo nano /etc/cassandra/cassandra.yaml

On CentOS Run the following:

sudo nano /etc/cassandra/default.conf/cassandra.yaml

The key points to edit are:

 

  • cluster_name: Can be anything chosen by you to describe the clusters name. All members of a cluster must have the same name. Note: before changing the name, first update the cluster name using the cqlsh tool otherwise you will get connection errors. Steps further down.
  • num_tokens: This value represents the number of virtual nodes within a Cassandra instance. It is used to partition the data and to spread it throughout the cluster. A good starting value is 256.
  • seeds: These are the IP addresses of the clusters seed servers. Seed nodes are used as known places to obtain cluster information (such as a list of nodes in the cluster). All active nodes have this information, to avoid a single point of failure. They are known locations that can be relied on, to have the information when other machines can come and go. It is recommended to have 3 seed nodes per datacenter.
  • listen_address: This is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication. The software will try to guess your machines IP address if you leave it blank, but it’s best to specify it yourself. This information will be specific on each node.
  • rpc_address: This is the IP address that Cassandra will listen on for client based communication, such as through the CQL protocol. This information will change on each node.
  • endpoint_snitch: Represents the ‘snitch’ used by Cassandra. A snitch tells Cassandra which datacenter and rack a node belongs to within a cluster. There are various types that could be used here, you may refer to the official documentation for more information on this topic.

Renaming Your Cassandra Cluster

 

One of the first config changes is to change the default cluster name. By default the Cassandra cluster name is ‘Test Cluster

We need to use the built in Cassandra tool cqlsh to update the cluster name in the db tables.

 

Run the following command to check the status of your current cluster:

nodetool status
nodetool

In the above screenshot we can see we have 1 node, its IP address and its on rack1.

 

We now need to connect to our cluster in order to update the cluster name. We will need to do this on every node. Instructions on adding more nodes is further down.

 

Run the following commands to connect to the cluster and update the cluster name:

cqlsh 127.0.0.1
cqlsh-tool

Replace [new_cluster_name] with your new cluster name:

UPDATE system.local SET cluster_name = '[new_cluster_name]' WHERE KEY = 'local';
update-cluster-name

Exit the CQL shell with the command exit

 

Run the following commands to flush the system cache and restart the Cassandra service:

sudo systemctl restart cassandra

We now need to edit the cassandra.yaml file.

 

On Ubuntu/Debian  > sudo nano /etc/cassandra/cassandra.yaml

On CentOS > sudo nano /etc/cassandra/default.conf/cassandra.yaml

 

Find the line that reads cluster_name: The default name is Test Cluster.  Change the name and save and exit the file.

cassandra-cluster-name

 

Now that you have updated the cassandra.yaml on each node with your new cluster name and also updated the cluster name in the db tables with the cluster name, we now need to flush the system cache and restart the cassandra service with the following commands.

sudo systemctl restart cassandra

Log into the cluster with cqlsh and verify the new cluster name is visible.

 

Add more Cassandra Nodes to Cluster

 

Using the image from the marketplace that you used to deploy Cassandra, deploy more servers that you want to use as additional nodes. Once deployed open the cassandra.yaml on each node and add the IP addresses of each node. Also update the new cluster name on each node.

 

Open the configuration file and under the seed _provider section, find the seeds entry:

 

cassandra-seeds

 

Add the IP address of every node in your cluster. Divide the entries by using a comma after every address. 

Cassandra Documentation / Configuration

 

To modify Cassandra’s default settings, check out the configuration files which are found in the /etc/cassandra directory. Data is stored in /var/lib/cassandra path. Start-up options can be tweaked in the /etc/default/cassandra file.

 

Command to start cassandra service:
sudo systemctl start cassandra

 

Command to restart cassandra service:

sudo systemctl restart cassandra

 

Command to stop cassandra service:
sudo systemctl stop cassandra

 

Command to verify cassandra service status

sudo systemctl -l status cassandra

 

Official Apache Cassandra Documentation can be found on:

https://cassandra.apache.org/doc/latest/

Cassandra Firewall Ports

 

The following ports must be open to allow bi-directional communication between nodes, including certain Cassandra ports. Configure the firewall running on nodes in your Cassandra cluster accordingly. Without open ports as shown, nodes will act as a standalone database server and will not join the Cassandra cluster.

 

  • TCP 7000 – Cassandra inter-node cluster communication.
  • TCP 7001 – Cassandra SSL inter-node cluster communication.
  • TCP 7199 – Cassandra JMX monitoring port.
  • TCP 9042 – Cassandra client port.
  • TCP 9160 – Cassandra client port (Thrift).
  • TCP 9142 – Default for native_transport_port_ssl, useful when both encrypted and unencrypted connections are required

 

If you are using any of the cloud security groups and need to change / add ports refer to the following guides:

 

To setup AWS firewall rules refer to – AWS Security Groups

To setup Azure firewall rules refer to – Azure Network Security Groups

To setup Google GCP firewall rules refer to – Creating GCP Firewalls

Disclaimer: Apache Cassandra is a registered trademark of Apache Software Foundation and is licensed under Apache License 2.0. No warrantee of any kind, express or implied, is included with this software. Use at your risk, responsibility for damages (if any) to anyone resulting from the use of this software rest entirely with the user. The author is not responsible for any damage that its use could cause.
Andrew Fitzgerald

Cloud Solution Architect. Helping customers transform their business to the cloud. 20 years experience working in complex infrastructure environments and a Microsoft Certified Solutions Expert on everything Cloud

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x