How To Use Sharding in MongoDB and How it Works (Tutorial)

How to use sharding in MongoDB and how it works (Tutorial). In the world of big data and high-performance databases, MongoDB sharding stands out as a crucial technique for scaling and managing large datasets. 

This article goes over the concept of MongoDB sharding, exploring how it works and its significance in distributed data storage.

Whether you’re a developer or a database administrator, this article equips you with the knowledge and know-how to effectively utilize sharding and optimize your MongoDB clusters for enhanced performance and scalability.

What is MongoDB Sharding?

Many organizations use NoSQL databases like MongoDB to handle the storage and computational requirements associated with storing and querying large volumes of data.

MongoDB sharding is a technique that provides scalability and performance improvements by allowing parallel processing of data across multiple shards.

In this process, large datasets are divided into smaller subsets and distributed across multiple MongoDB instances. Sharding also supports automatic data rebalancing, ensuring that data is evenly distributed across shards.

The purpose of this partitioning is to prevent excessive CPU usage on the MongoDB server that could occur when querying large datasets.

Components of Sharding

The sharded cluster has three main components.

  • Shard: The simplest shared cluster component, the shard, is used to hold a subset of the large dataset that has to be partitioned.
  • Config Servers: The config servers store metadata about the cluster’s configuration.
  • Query Routers: The Query Routers are responsible for directing user queries to the correct shard.

Benefits of MongoDB Sharding

The significance of MongoDB sharding are attributed to the following factors:

  • Usually, in MongoDB, the master nodes handles write operations while the slave nodes handle read operations and backups. However, sharding distributes queries evenly across all nodes due to the use of replica sets.
  • Expand the storage capacity of a sharded cluster by simply adding additional shards. It does not need complex hardware restructuring.
  • If one or more Shards go offline, the remaining active Shards continue to function, ensuring uninterrupted access to the data.

Follow this post about How To Use Sharding in MongoDB and How it Works.

How to Use Sharding in MongoDB and How it Works (Tutorial)

Prerequisites

You’ll need the following to implement this tutorial:

  • More than 3 distinct servers. Each of them must have a firewall set up using UFW and a regular, non-root user with sudo rights. This tutorial is performed on Ubuntu servers, but it works the same on other operating systems.
  • MongoDB server installed on each of the machines. Follow our article How to Install MongoDB on Ubuntu for instructions for each server to set this up.
  • Make sure that each server has the IP addresses of all other servers added as trusted IP addresses for remote access between all of the servers.

MongoDB Config Server Setup

There are three main components in a shared cluster. Config Server is one of them. For sharding, you configure one of these MongoDB servers into a replica set and equip it with the necessary functionality so that it may act as a configuration server for a sharded cluster.

To accomplish that, open the mongod.conf file in the nano editor with the following command.

				
					sudo nano /etc/mongod.conf
				
			

At the bottom of the file, locate the configuration section with the lines that begin with #replication and #sharding. Remove the pound symbol # from these lines to uncomment them.

Next, define the replSetName under replication to identify the MongoDB instance as a replica set. Use the name config since you are configuring this MongoDB instance as a replica set that acts as a configuration server.

Also, set the clusterRole to notify MongoDB that this server is going to be a configuration server for the sharded cluster.

				
					. . .
replication:
  replSetName: "config"

sharding:
  clusterRole: configsvr
. . .
				
			

Your mongod.conf file looks like this.

Save and close the file by pressing CTRL+O and CTRL+X, respectively. And restart the mongodb service for the changes to take effect.

				
					sudo systemctl restart mongod
				
			

Now, the replication is enabled on the server, but you need to initiate it through the MongoDB shell. Open the MongoDB shell with the following command.

				
					sudo mongosh
				
			

And run the following command to start replication.

				
					rs.initiate()
				
			

You see the output returns “ok”: 1 in its output, this means the replica set started successfully.

				
					{
        "info2" : "no configuration specified. Using a default configuration for the set",
        . . .
        "ok" : 1,
        . . .
}
				
			

Run the following command in the MongoDB shell to verify the replica set configuration.

				
					rs.status()
				
			

The output shows a lot of information, but you should look out for the following keys and values to confirm the replica is working properly.

				
					{
        . . .
        "set" : "config",
        . . .
        "configsvr" : true,
        "ok" : 1,
        . . .
}
				
			

Configure Shard Servers into Replica Sets

In this step, you convert MongoDB servers into replica sets and configure them to serve as shard servers. You need to run the following process for each server you plan to set up as a shard server.

Open the mongod.conf file with the following command. 

				
					sudo nano /etc/mongod.conf
				
			

Uncomment the replication and sharding section. Next, set the distinctive replSetName under the replication section for each shard instance and configure the clusterRole to shardsvr.

				
					. . .
replication:
  replSetName: "your shard name"

sharding:
  clusterRole: shardsvr
. . .
				
			

Which looks like the following:

Save, exit the file, and restart the mongod service.

				
					sudo systemctl restart mongod
				
			

Now that you have enabled replication, start the MongoDB shell with the following command.

				
					mongosh
				
			

And initiate replication with the help of the following command.

				
					rs.initiate()
				
			

The output will show “ok” : 1 if the replication is successful.

				
					{
        "info2" : "no configuration specified. Using a default configuration for the set",
        . . .
        "ok" : 1,
        . . .
}
				
			

Run the following command to verify the replica set configurations.

				
					rs.status()
				
			

Again, look for “ok” : 1 in the output for successful replication.

				
					{
        . . .
        "set" : "shard1",
        . . .
        "ok" : 1,
        . . .
}
				
			

The next step after setting up the Shard servers is configuring the query router server and connecting to each server individually.

Configure the Query Router MongoDB Server and Add Shards to the Cluster

The configuration server and multiple shards, the replica sets you’ve put up so far, are now operational but are not yet connected to the sharded cluster. 

You need one more server, a Mongos query router, to connect these components as parts of a sharded cluster. This MongoDB query router server executes mongos, manages and communicates with the shard servers and the config server within your shared cluster.

While the mongos query router daemon is a component of the regular MongoDB installation, it requires manual initiation to be operational. Because this server will not act as a database in the cluster, you must stop and disable the MongoDB database service.

				
					sudo systemctl stop mongod
sudo systemctl disable mongod
				
			

Now run the following command to connect to the config server using its specific replica name and IP.

				
					sudo mongos --configdb config/mongo_config_ip:27017
				
			

You see something like the following in the output. This verifies that the query router is connected to the config server.

Open a new window on the mongos router server and open the MongoDB shell using the following command.

				
					mongosh
				
			

And run the following command for each shard server using their respective names and IPs.

				
					sh.addShard("shard1/mongo_shard1_ip:27017")
				
			

Output:

				
					{
        "shardAdded" : "shard1",
        "ok" : 1,
        "operationTime" : Timestamp(1636301581, 6),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1636301581, 6),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
				
			

You can further verify the added shards by checking the status using the following command.:

				
					sh.status()
				
			

Enabling Sharding and Partition Data

Now, you freely use the mongos query router to store data just like an unsharded database. However, without enabling sharding, data only stores on the primary shard, missing out on the benefits of sharding.

To fully utilize the cluster, you need to enable sharding for a database. Sharded collections require sharding enabled. This guide uses a set of documents representing highly populated cities to understand data partitioning in MongoDB. This document set belongs to the collection called cities and the database name is populations.

First of all, enable sharding for the database with the following command. Use your database name, if you are using your database.

				
					sh.enableSharding("populations")
				
			

Output:

				
					{
        . . .
        "ok" : 1,
        . . .
}
				
			

Next, partition your collections, for example, by cities. Use two different methods to achieve sharding. This article is using hashed sharding in this particular example. You can refer to the official MongoDB documentation for the ranged sharding.

Run the following command to partition your collection using hashed sharding.

				
					sh.shardCollection("populations.cities", { "country": "hashed" })
				
			

Output:

				
					{
        "collectionsharded" : "populations.cities",
        "collectionUUID" : UUID("03823afb-923b-4cd0-8923-75540f33f07d"),
        "ok" : 1,
        . . .
}
				
			

Switch to your database using the following command.

				
					use populations
				
			

Add and insert some documents using the following command.

				
					db.cities.insertMany([
    {"name": "Seoul", "country": "South Korea", "continent": "Asia", "population": 25.674 },
    {"name": "Mumbai", "country": "India", "continent": "Asia", "population": 19.980 },
    {"name": "Lagos", "country": "Nigeria", "continent": "Africa", "population": 13.463 },
    {"name": "Beijing", "country": "China", "continent": "Asia", "population": 19.618 },
    {"name": "Shanghai", "country": "China", "continent": "Asia", "population": 25.582 },
    {"name": "Osaka", "country": "Japan", "continent": "Asia", "population": 19.281 },
    {"name": "Cairo", "country": "Egypt", "continent": "Africa", "population": 20.076 },
    {"name": "Tokyo", "country": "Japan", "continent": "Asia", "population": 37.400 },
    {"name": "Karachi", "country": "Pakistan", "continent": "Asia", "population": 15.400 },
    {"name": "Dhaka", "country": "Bangladesh", "continent": "Asia", "population": 19.578 },
    {"name": "Rio de Janeiro", "country": "Brazil", "continent": "South America", "population": 13.293 },
    {"name": "São Paulo", "country": "Brazil", "continent": "South America", "population": 21.650 },
    {"name": "Mexico City", "country": "Mexico", "continent": "North America", "population": 21.581 },
    {"name": "Delhi", "country": "India", "continent": "Asia", "population": 28.514 },
    {"name": "Buenos Aires", "country": "Argentina", "continent": "South America", "population": 14.967 },
    {"name": "Kolkata", "country": "India", "continent": "Asia", "population": 14.681 },
    {"name": "New York", "country": "United States", "continent": "North America", "population": 18.819 },
    {"name": "Manila", "country": "Philippines", "continent": "Asia", "population": 13.482 },
    {"name": "Chongqing", "country": "China", "continent": "Asia", "population": 14.838 },
    {"name": "Istanbul", "country": "Turkey", "continent": "Europe", "population": 14.751 }
])
				
			

Output:

				
					{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("61880330754a281b83525a9b"),
                ObjectId("61880330754a281b83525a9c"),
                ObjectId("61880330754a281b83525a9d"),
. . .
       
        ]
}
				
			

Next, verify the data partition and get information on detailed data distribution across shards with the following command.

				
					db.cities.getShardDistribution()
				
			

Output:

				
					Shard shard2 at shard2/mongo_shard2_ip:27017
 data : 943B docs : 9 chunks : 2
 estimated data per chunk : 471B
 estimated docs per chunk : 4

Shard shard1 at shard1/mongo_shard1_ip:27017
 data : 1KiB docs : 11 chunks : 2
 estimated data per chunk : 567B
 estimated docs per chunk : 5

Totals
 data : 2KiB docs : 20 chunks : 4
 Shard shard2 contains 45.4% data, 45% docs in cluster, avg obj size on shard : 104B
 Shard shard1 contains 54.59% data, 55% docs in cluster, avg obj size on shard : 103B
				
			

Also further analyse the shard usage using the explain() method in the queries. The exact syntax is as follows:

				
					db.collection_name.find({key: Value}).explain()
				
			

The explain feature on the query helps determine, if a query spans one or multiple shards.

That’s it about How To Use Sharding in MongoDB and How it Works.

How to Use Sharding in MongoDB and How it Works (Tutorial) Conclusion

Enhance MongoDB performance and handle expanding data needs by understanding and utilizing sharding.

Explore our other content on MongoDB to learn more.

Avatar for Sobia Arshad
Sobia Arshad

Information Security professional with 4+ years of experience. I am interested in learning about new technologies and loves working with all kinds of infrastructures.

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x