What is MongoDB Sharding: Step by Step Tutorial with Example​

The volume of data that businesses collect in the contemporary modern world is massive. The collected data is usually related to the products and services, offerings and finances of the company. They use this data to make strategic investment and data-driven decisions to plan their future performance accordingly. Have you ever wondered about the evolution of data-storing methods that have taken place over the past decade? No? Then continue reading to find out.

Earlier relevant data were collected and stored using conventional Relational databases that were complex and had fewer features. With newer technologies, we have experienced a change in the data storage system. Today, the volume of data collected by companies is so high that the primitive Relational method is incapable of scaling horizontally. As a result, many organizations are shifting towards NoSQL Databases Clusters for their data storage requirements. Let us tell you one of the most well-known NoSQL Databases Clusters used by many multinationals today- MongoDB Sharding. Well, how does it work? It works by keeping up the demands of data growth by a process known as sharding.

Are you curious to understand what MongoDB Sharding is, how it works and how you can implement it right away within your company? If yes, then you have landed in the right place. In this article, we will be sharing the nitty-gritty of MongoDB Sharding and how you can implement it for your MongoDB server.

So, without wasting any more time, let us understand what MongoDB Sharding is.?

What is MongoDB Sharding?

The purpose of using this NoSQL database for many companies is to deal with the storage and commuting demands of high-volume data. MongoDB sharding is the way MongoDB deals with high volumes of data in organizations. Let us further break this down and understand the exact meaning of sharding. Sharding is the process of sharing data across multiple hosts. In MongoDB sharing, the data is split into small sets and then shared among various MongoDB instances to avoid high CPU utilization of the MongoDB server.

Components of MongoDB Sharde Cluster

what is mongodb sharding

The MongoDB Sharding works by creating a cluster of MongoDB instances that has at least 3 servers. In simple terms, that means the MongoDB Sharding has 3 main components:

 

  1. The Shard
  2. The Query Router
  3. The Config Servers

 

Let’s understand these components one by one.

The Shard

The Shard is the most basic unit of a shared cluster that stores subsets of large datasets. They can be deployed as replica sets to improve availability and redundancy. If we talk about a dataset, then you should note that a dataset is a creation of multiple shards. For instance, a 2TB dataset can be broken into shards of four each, contains 500GB data from the original dataset.

The Query Router

The query router that provides a stable interface between the application and the sharded cluster is known as the Mongos. They are solely responsible for routing the client’s request and queries to the right shard.

The Config Server

The purpose of Config Servers is to configure settings and store metadata of the MongoDB sharded clusters. The metadata stores all the information about what subset of data is stored where and in which shard. This information is helpful to direct user’s queries most appropriately. Also, each sharded cluster has approximately 3 config servers.

Now that you are aware of the 3 components of MongoDB Shard Cluster, let us look at how using this NoSQL database can help your organization handle high volumes of data.

Benefits of MongoDB Sharding

  • In MongoDB Sharding, the read and write performance correlates to the number of server nodes in the cluster. This method is a great way to boost cluster performance as it simply requires the addition of more nodes.
  • The continued and consistent performance of MongoDB is what sets it apart. It continues to operate even when single or multiple shards are unavailable. Moreover, the client can still access all other shards within the cluster and without any downtime.
  • The replication process requires vertical standing in situations demanding large data handling. This can often lead to hardware limitations and prohibitive costs. On the other hand, if we consider MongoDB that uses horizontal scaling, we notice that the workload gets distributed. Also, when there is a need, the additional servers can also be easily added to the cluster.
  • In traditional set-ups lacking MongoDB Sharding, it is the Master nodes that handle all the large number of writing operations. On the other hand, slave nodes are responsible for reading and maintaining the back-ups. In MongoDB Sharding, there is no such problem as it uses replica sets that distribute all the queries equally among the nodes.
  • Each shard in the MongoDB Sharding has a subset of the complete dataset. Simply adding shards will increase the storage capacity even without the need for complex hardware restructuring.

How to Setup MongoDB Sharding

At this point, you understand the concepts of sharding. Now, we will explain how to set up a MongoDB Sharding Cluster on three Ubuntu 20.04 servers.

 

We will use the following setup for setting up MongoDB Sharding:

 

Config Server   

  • IP Address: 45.58.44.155

 

Query Router (mongos) 

  • IP Address: 45.58.42.28

 

Shard  

  • IP Address: 216.98.8.110

Install MongoDB on all Servers

First, you will need to install the MongoDB community server package on all three servers. Perform the below steps on all servers to install the MongoDB community server:

First, run the following command to install all required dependencies.

				
					apt-get install gnupg2 wget tree -y
				
			

Next, import the MongoDB GPG key using the command below:

				
					apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
				
			

Sample output:

				
					Executing: /tmp/apt-key-gpghome.Husn1BzoYD/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
gpg: key 68818C72E52529D4: public key "MongoDB 4.0 Release Signing Key " imported
gpg: Total number processed: 1
gpg:               imported: 1
				
			

Next, add the MongoDB repository to APT source file:

				
					echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | tee /etc/apt/sources.list.d/mongodb-org.list
				
			

Next, update the repository and install the MongoDB server package:

				
					apt-get update -y
apt-get install mongodb-org -y
				
			

Once the MongoDB server package is installed, verify the MongoDB version using the following command:

				
					mongo --version
				
			

Sample output:

				
					MongoDB shell version v4.0.26
git version: f12d07945bd82ff9b6726aa74b84ea4e94b06171
OpenSSL version: OpenSSL 1.1.1f  31 Mar 2020
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1804
    distarch: x86_64
    target_arch: x86_64
				
			

Next, stop the MongoDB service on all servers using the following command:

				
					systemctl stop mongod
				
			

Configure Config Server

In this section, we will configure the config server to be a replica set. First, create a directory structure using the following command:

				
					mkdir -p  /mongodb-config/data/configdb
mkdir -p  /mongodb-config/data/logs
touch /mongodb-config/data/logs/configsvr.log
				
			

You can now verify your directory structure using the following command:

				
					tree /mongodb-config
				
			

Sample output:

				
					/mongodb-config
└── data
    ├── configdb
    └── logs
        └── configsvr.log
				
			

Next, create a new configuration file for Config Server using your favorite editor:

				
					nano /etc/mongodConfig.conf
				
			

Define your storage path, logging path, port, server IP, cluster role, and replica set as shown below:

				
					storage:
  dbPath: /mongodb-config/data/configdb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /mongodb-config/data/logs/configsvr.log

net:
  port: 27019
  bindIp: 45.58.44.155

sharding:
  clusterRole: configsvr

replication:
  replSetName: ConfigReplSet

				
			

Save and close the file once you are finished. Next, start the config server using the following command:

				
					mongod --config /etc/mongodConfig.conf &
				
			

Next, check the config server listening port with the following command:

				
					ss -antpl | grep 27019
				
			

Sample output:

				
					LISTEN    0         128           45.58.44.155:27019            0.0.0.0:*        users:(("mongod",pid=3435,fd=11))  
				
			

Next, connect to the config server using the following command:

				
					mongo 45.58.44.155:27019
				
			

Once you are connected to mongo shell, initiate the config server using the following command:

				
					rs.initiate()
				
			

Sample output:

				
					{
	"info2" : "no configuration specified. Using a default configuration for the set",
	"me" : "45.58.44.155:27019",
	"ok" : 1,
	"$gleStats" : {
		"lastOpTime" : Timestamp(1627985165, 1),
		"electionId" : ObjectId("000000000000000000000000")
	},
	"lastCommittedOpTime" : Timestamp(0, 0)
}
				
			

Next, check the status of the server with the following command:

				
					rs.status()
				
			

Sample output:

				
					{
	"set" : "ConfigReplSet",
	"date" : ISODate("2021-08-03T10:06:24.083Z"),
	"myState" : 1,
	"term" : NumberLong(1),
	"syncingTo" : "",
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"configsvr" : true,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"optimes" : {
		"lastCommittedOpTime" : {
			"ts" : Timestamp(1627985175, 1),
			"t" : NumberLong(1)
		},
		"readConcernMajorityOpTime" : {
			"ts" : Timestamp(1627985175, 1),
			"t" : NumberLong(1)
		},
		"appliedOpTime" : {
			"ts" : Timestamp(1627985175, 1),
			"t" : NumberLong(1)
		},
		"durableOpTime" : {
			"ts" : Timestamp(1627985175, 1),
			"t" : NumberLong(1)
		}
	},
	"lastStableCheckpointTimestamp" : Timestamp(1627985165, 27),
	"electionCandidateMetrics" : {
		"lastElectionReason" : "electionTimeout",
		"lastElectionDate" : ISODate("2021-08-03T10:06:05.122Z"),
		"electionTerm" : NumberLong(1),
		"lastCommittedOpTimeAtElection" : {
			"ts" : Timestamp(0, 0),
			"t" : NumberLong(-1)
		},
		"lastSeenOpTimeAtElection" : {
			"ts" : Timestamp(1627985165, 1),
			"t" : NumberLong(-1)
		},
		"numVotesNeeded" : 1,
		"priorityAtElection" : 1,
		"electionTimeoutMillis" : NumberLong(10000),
		"newTermStartDate" : ISODate("2021-08-03T10:06:05.125Z"),
		"wMajorityWriteAvailabilityDate" : ISODate("2021-08-03T10:06:05.506Z")
	},
	"members" : [
		{
			"_id" : 0,
			"name" : "45.58.44.155:27019",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 252,
			"optime" : {
				"ts" : Timestamp(1627985175, 1),
				"t" : NumberLong(1)
			},
			"optimeDate" : ISODate("2021-08-03T10:06:15Z"),
			"syncingTo" : "",
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "could not find member to sync from",
			"electionTime" : Timestamp(1627985165, 2),
			"electionDate" : ISODate("2021-08-03T10:06:05Z"),
			"configVersion" : 1,
			"self" : true,
			"lastHeartbeatMessage" : ""
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(1627985175, 1),
	"$gleStats" : {
		"lastOpTime" : Timestamp(1627985165, 1),
		"electionId" : ObjectId("7fffffff0000000000000001")
	},
	"lastCommittedOpTime" : Timestamp(1627985175, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1627985175, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
				
			

Configure Query Router

Next, log in to the Query Router server and create a directory structure using the following command:

				
					mkdir -p  /mongodb-config/data/logs
touch /mongodb-config/data/logs/mongorouter.log
				
			

Next, create a new configuration file for Query Router with the following command:

				
					nano /etc/mongoRouter.conf
				
			

Define your log file, IP address, port, and configDB as shown below:

				
					systemLog:
  destination: file
  logAppend: true
  path: /mongodb-config/data/logs/queryrouter.log

net:
  port: 27017
  bindIp: 45.58.42.28

sharding:
  configDB: ConfigReplSet/45.58.44.155:27019
				
			

Save and close the file then start the Query Router with the following command:

				
					mongos --config /etc/mongoRouter.conf &
				
			

At this point, your MongoDB Query Router is started and listening on port 27017. You can check it with the following command:

				
					ss -antpl | grep 27017
				
			

Sample output:

				
					LISTEN    0         128            45.58.42.28:27017            0.0.0.0:*        users:(("mongos",pid=2768,fd=10))
				
			

Now, connect to the Query Router using the following command:

				
					mongo 45.58.42.28:27017
				
			

Once you are connected, you should get the following output:

				
					MongoDB shell version v4.0.26
connecting to: mongodb://45.58.42.28:27017/test?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("a31d5e22-9c10-4ac6-a5b1-67d0c78aaa77") }
MongoDB server version: 4.0.26
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
Server has startup warnings: 
2021-08-03T10:09:52.580+0000 I CONTROL  [main] 
2021-08-03T10:09:52.580+0000 I CONTROL  [main] ** WARNING: Access control is not enabled for the database.
2021-08-03T10:09:52.580+0000 I CONTROL  [main] **          Read and write access to data and configuration is unrestricted.
2021-08-03T10:09:52.580+0000 I CONTROL  [main] ** WARNING: You are running this process as the root user, which is not recommended.
2021-08-03T10:09:52.580+0000 I CONTROL  [main] 

				
			

Configure Shard

Next, log in to Shard server and create a directory structure with the following command:

				
					mkdir -p  /mongodb-config/data/sharddb/
mkdir -p  /mongodb-config/data/logs
touch /mongodb-config/data/logs/shard.log
				
			

Next, create a new configuration file for Shard with the following command:

				
					nano /etc/mongodShard.conf
				
			

Define storage path, log file, port, server IP, cluster role, and replica set as shown below:

				
					storage:
  dbPath: /mongodb-config/data/sharddb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /mongodb-config/data/logs/shard.log

net:
  port: 27018
  bindIp: 216.98.8.110

sharding:
  clusterRole: shardsvr

replication:
  replSetName: ShardReplSet
				
			

Save and close the file then start the Shard server with the following command:

				
					mongod --config /etc/mongodShard.conf &
				
			

At this point, the Shard server is started and listening on port 27018. You can check it with the following command:

				
					ss -antpl | grep 27018
				
			

Sample output:

				
					LISTEN    0         128           216.98.8.110:27018            0.0.0.0:*        users:(("mongod",pid=3067,fd=11))       
				
			

Next, connect to the Shard with the following command:

				
					mongo 216.98.8.110:27018
				
			

Once you are connected, initiate the Shard server with the following command:

				
					rs.initiate()
				
			

Sample output:

				
					{
	"info2" : "no configuration specified. Using a default configuration for the set",
	"me" : "216.98.8.110:27018",
	"ok" : 1
}
				
			

Next, verify the status of the server with the following command:

				
					rs.status()
				
			

Sample output:

				
					{
	"set" : "ShardReplSet",
	"date" : ISODate("2021-08-03T10:13:18.857Z"),
	"myState" : 1,
	"term" : NumberLong(1),
	"syncingTo" : "",
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
.....

				
			

Add the Shard to the Cluster

Next, you will need to add your Shard server to the cluster. First, log in to the Query Router server. Then, connect to the mongo shell with the following command:

				
					mongo 45.58.42.28:27017
				
			

Once you are connected, add your Shard server with the following command:

				
					sh.addShard( "ShardReplSet/216.98.8.110:27018")
				
			

Sample output:

				
					{
	"shardAdded" : "ShardReplSet",
	"ok" : 1,
	"operationTime" : Timestamp(1627985686, 3),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1627985686, 3),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
				
			

Enable Sharding for a Database

Next, you will need to create a new database and enable sharding for the new database. On the Query Router server, connect to the mongo shell and run the following command to create a new database named peoples:

				
					use peoples
				
			

Next, enable the sharding on the peoples database with the following command:

				
					sh.enableSharding("peoples")
				
			

Sample output:

				
					{
	"ok" : 1,
	"operationTime" : Timestamp(1627985732, 5),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1627985732, 5),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
				
			

Next, check the sharding status using the following command:

				
					sh.status()
				
			

Sample output:

				
					--- Sharding Status --- 
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("6109150dc1925cb250d31779")
  }
  shards:
        {  "_id" : "ShardReplSet",  "host" : "ShardReplSet/216.98.8.110:27018",  "state" : 1 }
  active mongoses:
        "4.0.26" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
        {  "_id" : "peoples",  "primary" : "ShardReplSet",  "partitioned" : true,  "version" : {  "uuid" : UUID("5fb48b7f-3bf4-4c64-99a1-624c81979e55"),  "lastMod" : 1 } }
				
			

Next, you will need to add a new collection to the database with sharding support. Let’s add a new collection named collection to the peoples database:

				
					sh.shardCollection("peoples.collection", {"name":1})
				
			

Sample output:

				
					{
	"collectionsharded" : "peoples.collection",
	"collectionUUID" : UUID("959b892b-c930-4b8a-a323-d1a0436f7ec0"),
	"ok" : 1,
	"operationTime" : Timestamp(1627985777, 14),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1627985777, 14),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
				
			

Next, insert the documents to the collections with the following command:

				
					db.collection.save({
    "name": "Application List",
    "apps": ["Apache", "MariaDB", "Redis", "PHP"],
})
				
			

Verify Sharding

Next, you will need to verify if the sharding is working as intended. First, log in to the Shard server and connect to the mongo shell with the following command:

				
					mongo 216.98.8.110:27018
				
			

Once you are connected, run the following command to check the database available on the replica set:

				
					show dbs
				
			

Sample output:

				
					admin    0.000GB
config   0.000GB
local    0.000GB
peoples  0.000GB
				
			

Next, switch database to peoples with the following command:

				
					use peoples
				
			

Next, check your collections and documents in the replica set using the following command:

				
					db.collection.find()
				
			

Sample output:

				
					{ "_id" : ObjectId("610917b9cced2acebffc5521"), "name" : "Application List", "apps" : [ "Apache", "MariaDB", "Redis", "PHP" ] }
				
			

Conclusion

That’s it for now. You have successfully deployed three-node MongoDB Shard cluster on Ubuntu 20.04. You can now manage large data sets by distributing the workload across many servers.

Avatar for Hitesh Jethva
Hitesh Jethva

I am a fan of open source technology and have more than 10 years of experience working with Linux and Open Source technologies. I am one of the Linux technical writers for Cloud Infrastructure Services.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x