evaluting mongodb python tpccFollowing my blog post Evaluating the Python TPCC MongoDB Benchmark, I wanted to evaluate how MongoDB performs under workload with a bigger dataset. This time I will load a 1000 Warehouses dataset, which in raw format should equal to 100GB of data.

For the comparison, I will use the same hardware and the same MongoDB versions as in the blog post mentioned above. To reiterate:

Hardware Specs

For the client and server, I will use identical bare metal servers, connected via a 10Gb network.

The node specification:

MongoDB Topology

For MongoDB I used:

  • Single node instance without limiting cache size. As the bare metal server has 180GB of RAM, MongoDB should allocate 90GB of memory for WiredTiger cache and the rest will be used for OS cache. This should produce more CPU bound workload.
  • Single node instance with limited cache size. For WiredTiger cache I will set limit 25GB, and to limit OS cache I will limit the memory available to mongodb instance to 50GB, as described in Using Cgroups to Limit MySQL and MongoDB memory usage.
  • Replicate set setup with 3 nodes and limited cache as described above.

MongoDB Versions:

Loading Data

I will load data using PyPy python version and using 100 clients and timing it:

The results:

4.0

4.2

4.4

To Highlight:

4.2 loaded data a little faster than 4.0, and 4.4 performed extremely bad, being about 20 times slower than 4.2. I hope this is some Release Candidate bug which will be fixed for the release.

The size of MongoDB datadir is 165GB, it seems there is an overhead compared to the raw 100GB datasize.

Benchmark Results

Results With an Unlimited Cache

The results are in NEW ORDER transactions per minute, AKA more is better.

MongoDB Python Benchmarks

 

MongoDB Version Benchmarks

Results With a Limited Cache

In this cache, I allocate only 25GB for WiredTiger and 50GB for the mongodb process in total.

The results are in NEW ORDER transactions per minute; more is better.

Results With a Limited Cache MongoDB

 

Results with 3 Nodes ReplicaSet and Limited Cache

In this case, I only compare 4.0 and 4.2, as from the previous results, there is something going on with 4.4 and I want to wait until GA release to measure it in ReplicaSet setup.

‘write_concern’: 1 for this benchmark.

The results are in NEW ORDER transactions per minute, and more is better.

Results with 3 Nodes ReplicaSet and Limited Cache

 

Now we can compare how much overhead there is from ReplicaSets:

With ‘write_concern’: 1 there really should not be much overhead from replicaset, which is confirmed for version 4.0. However, 4.2 shows a noticeable difference, which is a point for further investigation.

Conclusion

What is obvious from the collective results is that the 4.2 version took a noticeable performance hit, sometimes showing as much as a 2x throughput decline compared to 4.0.

Version 4.4, as of the current RC status, showed long load times and variation in the performance results under high concurrent load. I want to wait for the GA release for the final evaluation.

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Asya

You don’t provide the contents of the mconfig file – without that it’s hard to know what might be happening. Can you provide that info?

Henrik Ingo

Vadim

Your last results are a great clue.

“Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value flowControlTargetLagSeconds”

https://docs.mongodb.com/manual/replication/

If you monitor for replication lag in your 4.0 test, you can confirm or deny my hypothesis.