MongoDB sharded cluster in Docker compose

MongoDB sharded cluster in Docker compose

Replica sets, sharding and router in a handful of containers

Introduction

In this article, we are going to cover the experience I've had deploying a production-ready MongoDB database using Docker Compose. I was tasked with setting up a sharded cluster for the purposes of an academic project, and following my usual approach, I looked into streamlining the process with containers. I was surprised to find a lack of documentation on the matter, and while some blogs do cover the topic, they generally stopped at the creation of replica sets.

Hence, we will be walking through the different steps and components that make up a MongoDB sharded cluster, stopping along the way to do some container optimization, and finally have our production-ready database.

Recipe for a sharded cluster

Sharded clusters are MongoDB deployments that allow the use of sharding, a horizontal scaling method to divide the data and operations across multiple servers. Where vertical scaling, which involves increasing the capacity of one server, can quickly reach its limits as well as become costly, horizontal scaling is almost limitless- Especially when working with containers. These different servers also work as a way of guaranteeing higher availability, as they all contain copies of the data.

These clusters have three main working parts that function in tandem: Config servers, Shards, and Routers.

  • Config servers are replica sets that store metadata and configuration settings for the cluster;

  • Shards are replica sets that contain a subset or a copy of the data;

  • Routers are an interface between the clients and the sharded data.

In this article, we will be deploying three config servers and three shards in replica sets, which is the recommended approach by MongoDB, and one mongos router. For more information about replica sets and replication, please check out the official documentation. Now, let's get into the deployment itself!

Dockerizing it

Replica sets

The first step to creating our sharded cluster is initializing the two replica sets we will need, respectively for the config servers and the shards. Let's create a docker-compose.yml file and our first container:

services:
  mongo.cfg1:
    container_name: mongo.cfg1
    image: mongo
    networks:
      - public
    restart: always
    volumes:
      - mongo.cfg.vol1:/data/db
      - /path/to/keyfile:/data/keyfile
    entrypoint: [ "/usr/bin/mongod", "--configsvr", "--bind_ip_all", "--replSet", "cfgrs", "--keyFile", "/data/keyfile"]

volumes:
  mongo.cfg.vol1:
    name: "mongo.cfg.vol1"

networks:
  public:

This service, mongo.cfg1, is our first configuration server and will serve as a general template for all our replica sets.

  • mongod is the main daemon to handle server tasks;

  • --configsvr indicates to MongoDB we wish to use it as a config server;

  • --replSet allows us to specify the name of the replica set that will be used for the config servers. We will be using "cfgrs".

We also mount the data in a volume to keep it between restarts; In that same volume entry, you might notice a binding for a keyfile. Keep it in mind for now, we will elaborate on it a bit later.

To that, we will then add our first shard:

  mongo.s1:
    container_name: mongo.s1
    image: mongo
    networks:
      - public
    restart: always
    volumes:
      - mongo.s.vol1:/data/db
      - /path/to/keyfile:/data/keyfile
    entrypoint: [ "/usr/bin/mongod", "--shardsvr", "--bind_ip_all", "--replSet", "srs", "--keyFile", "/data/keyfile"]

While very similar, we use --shardsvr instead to configure it as a shard, and a different replica set name: "srs". Now, we can just add as many of those as needed; We will be creating three of each here, giving us this docker-compose.yml format:

services:
  mongo.cfg1:
    ...
  mongo.cfg2:
    ...
  mongo.cfg3:
    ...

  mongo.s1:
    ...
  mongo.s2:
    ...
  mongo.s3:
    ...

volumes:
  mongo.cfg.vol1:
    name: "mongo.cfg.vol1"

  mongo.cfg.vol2:
    name: "mongo.cfg.vol2"

  mongo.cfg.vol3:
    name: "mongo.cfg.vol3"

  mongo.s.vol1:
    name: "mongo.s.vol1"

  mongo.s.vol2:
    name: "mongo.s.vol2"

  mongo.s.vol3:
    name: "mongo.s.vol3"

networks:
  public:

The full docker-compose.yml file is available on GitHub Gists.

Router

While we can use a replica set directly like a normal database with replication, it will be much more useful once set up behind a router server that will utilize the sharded clusters for better performance during queries.

To do so, we will create one last MongoDB container, albeit with a much simpler separate configuration file:

version: "3"

services:
  mongo.main:
    container_name: mongo.main
    image: mongo
    networks:
      - public
    ports:
      - 27017:27017
    restart: always
    volumes:
       - /path/to/keyfile:/data/keyfile
    entrypoint: [ "/usr/bin/mongos", "--configdb", "cfgrs/mongo.cfg1:27019,mongo.cfg2:27019,mongo.cfg3:27019", "--bind_ip_all", "--keyFile", "/data/keyfile"]

networks:
  public:
    external: true

Bringing it together

Phew! That was a lot of configuration. Fortunately, all we need to do now is start our containers and tie our replica sets together with our router.

We will start by creating the keyfile we mentioned earlier. To generate a keyfile, use the command openssl rand -base64 741 and save the results into a file of your choosing, updating the path in the docker-compose.yml files to reflect its location.

Then, all we need is a script:

#!/bin/bash

cd /path/to/replicaset

docker compose up -d

sleep 20

docker compose exec mongo.cfg1 mongosh --port 27019 --eval 'rs.initiate({"_id": "cfgrs","configsvr": true,"members": [{"_id": 0,"host": "mongo.cfg1:27019"},{"_id": 1,"host": "mongo.cfg2:27019"},{"_id": 2,"host": "mongo.cfg3:27019"}]})'

docker compose exec mongo.s1 mongosh --port 27018 --eval 'rs.initiate({"_id": "srs","members": [{"_id": 0,"host": "mongo.s1:27018"},{"_id": 1,"host": "mongo.s2:27018"},{"_id": 2,"host": "mongo.s3:27018"}]})'

cd /path/to/router

docker compose up -d

sleep 20

docker compose exec mongo.main mongosh --eval 'sh.addShard("srs/mongo.s1:27018")'

docker compose exec -T mongo.main mongosh <<\EOF
use admin;
db.createUser({user: "Admin", pwd: "admin", roles: [ {role: "userAdminAnyDatabase", db: "admin"} ]});
EOF

Let's break this wall of code down.

  1. The script navigates to the folder containing the configuration for the replica sets, and starts the shard and config containers;

  2. With the containers now running, two rs.initiate commands are ran, to create each replica set (the shards and the config servers);

  3. It then navigates to the folder with the router configuration and similarly starts the container;

  4. The main shard is then added, associating the router with our shard replica set and completing the Sharded Cluster;

  5. Finally, a basic admin user is created. Don't forget to change its password once you connect for the first time!

After running the script and checking for any errors, you're done! Run docker ps and take a look at all the containers you created.

You can now connect to your MongoDB server from anywhere with any client you like, provided the server you're using has its MongoDB port (27017) open.

What we learned

If you've followed this tutorial to its end, or simply stuck around out of curiosity, thank you for reading!

While we glanced over a lot of the more complex concepts around replica sets and sharding in MongoDB, you should now have a good grasp of the basics of a Sharded Cluster and its essential components, as well as the additionnal steps and configuration required to make it work in a Docker environment.

Happy coding, and keep on containerizing!

Did you find this article valuable?

Support Emerik Aji by becoming a sponsor. Any amount is appreciated!