Through the Looking Glass: Analyzing the interplay between memory, disk, and read performance.

Kevin Arhelger
July 3, 2018 | Updated: March 11, 2019
#Technical

Introduction

Understanding the relationships between various internal caches and disk performance, and how those relationships affect database and application performance, can be challenging. We’ve used the YCSB benchmark, varying the working set (number of documents used for the test) and disk performance, to better show how these relate. While reviewing the results, we’ll cover some MongoDB internals to improve understanding of common database usage patterns.

Key Takeaways

Knowing disk baseline performance is important for understanding overall database performance.
High disk await and utilization are indicative of a disk bottleneck.
WiredTiger IO is random.
A query targeting a single replica set is single threaded and sequential.
Disk performance and working set size are closely related.

Summary

The primary contributors to overall system performance are how the working set relates to both the storage engine cache size (the memory dedicated for storing data) and disk performance (which provides a physical limit to how quickly data can be accessed).

Using YCSB, we explore the interactions between disk performance and cache size, demonstrating how these two factors can affect performance. While YCSB was used for this testing, synthetic benchmarks are not representative of production workloads. Latency and throughput numbers obtained with these methods do not map to production performance. We utilized MongoDB 3.4.10, YCSB 0.14, and the MongoDB 3.6.0 driver for these tests. YCSB was configured with 16 threads, and the “uniform” read only workload.

We show that fitting your working set inside memory provides for optimal application performance and as with any database, exceeding this limit negatively affects latency and overall throughput.

Understanding Disk Metrics

There are four important metrics when considering disk performance:

Disk throughput, or number of requests multiplied by the request size. This is usually measured in megabytes per second. Random read and write performance in the 4kb range is the most representative of standard database workloads. Note that many cloud providers limit the disk throughput or bandwidth.
Disk latency. On Linux this is represented by `await`, the time in milliseconds from an application issuing a read or write before the data is written or returned to the application. For SSDs, latencies are typically under 3ms. HDDs are typically above 7ms. High latencies indicate disks have trouble keeping up with the given workload.
Disk IOPS (Input/Output Operations Per Second). `iostat` reports this metric as `tps`. A given cloud provider may guarantee a certain number of IOPS for a given drive. Should you reach this threshold, any further accesses will be queued, resulting in a disk bottleneck. A high end PCIe attached NVMe device could offer 1,500,000 IOPS while a typical hard disk may only support 150 IOPS.
Disk utilization. Reported by `util` in `iostat`. Linux has multiple `queues` per device for servicing IO. Utilization indicates what percentage of these queues is busy at a given time. While this number can be confusing, it is a good indicator of overall disk health.

Testing Disk Performance

While cloud providers may provide an IOPS threshold for a given volume and disk, and disk manufacturers publish expected performance numbers, the actual results on your system may vary. If the observed disk performance is in question, performing an IO test can be very helpful.

We generally test with fio, the Flexible IO Tester. We performed tests on 10GB of data, the ioengine of psync, and with reads ranging between 4kb and 32kb. While the default fio settings are not representative of the WiredTiger workload, we have found this configuration to be a good approximation of WiredTiger disk utilization.

All tests were repeated under three disk scenarios:

Scenario 1

Default disk settings provided by a AWS c5 io1 100GB volume. 5000 IOPS

1144 IOPS / 5025 physical reads per second / 99.85% util

Scenario 2

Limiting the disk to 600 IOPS and introducing 7ms of latency. This should mirror the performance of a typical RAID10 SAN with hard drives

134 IOPS / 150 physical reads per second / 95.72% util

Scenario 3

Further limiting the disk to 150 IOPS with 7ms latency. This should model a commodity spinning hard drive.

34 IOPS / 150 physical reads per second / 98.2% utilization

How is a query serviced from disk?

The WiredTiger Storage Engine performs its own caching. By default, the WiredTiger cache is sized at 50% of system memory minus 1GB to allow adequate space for both other system processes, the filesystem cache, and internal MongoDB operations that consume additional memory such as building indexes, performing in memory sorts, deduplicating results, text scoring, connection handling, and aggregations. To prevent performance degradation from a totally full cache, WiredTiger automatically begins evicting data from the cache when the utilization grows above 80%. For our tests, this means the effective cache size is (7634MB - 1024MB) * .5 * .8, or 2644MB.

All queries are serviced from the WiredTiger cache. This means a query will cause indexes and documents to be read from disk through the filesystem cache into the WiredTiger cache before returning results. If the requested data is already in the cache, this step is skipped.

WiredTiger stores documents with the snappy compression algorithm by default. Any data read from the file system cache is first decompressed before storing in the WiredTiger cache. Indexes utilize prefix compression by default and are compressed both on disk and inside the WiredTiger cache.

The filesystem cache is an Operating System construct to store frequently accessed files in memory to facilitate faster accesses. Linux is very aggressive in caching files and will attempt to consume all free memory with the filesystem cache. If additional memory is needed, the filesystem cache is evicted to allow more memory for applications.

Here is an animated graphic, showing the disk accesses for the YCSB collection resulting from 100 YCSB read operations. Each operation is an individual find for providing the _id for a single document.

The upper left hand corner represents the first byte in the WiredTiger collection file. Disk locations increment to the right hand side and wrap around. Each row represents a 3.5MB segment of the WiredTiger collection file. The accesses are ordered by time and represented by the frame of animation. Accesses are represented in red and green boxes to highlight the current disk access.

3.5 MB vs 4KB

Here we see the data file for our collection read into memory. Because the data is stored in B+ trees, we may need to find the disk location of our document (the smaller accesses) by visiting one or more locations on disk before our document is found and read (the wider accesses).

This demonstrates the typical access patterns of a MongoDB query – documents are unlikely to be close to each other on disk. This also shows it is highly unlikely for documents, even when inserted after each other, to be in consecutive disk locations.

The WiredTiger storage engine is designed to “read completely”: it will issue a read for all of the data it needs at once. This leads to our recommendation to limit the disk read ahead for WiredTiger deployments to zero, as subsequent accesses are unlikely to take advantage of the additional data retrieved through read ahead.

Working Set Fits in Cache

For our first set of tests, we set the record count to 2 million, resulting in a total size for both data and indexes of 2.43 GB or 92% of cache.

Here we see strong scenario 1 performance of 76,113 requests per second. Checking the filesystem cache statistics, we observe a WiredTiger cache hit rate of 100% with no accesses and zero bytes read into the filesystem cache, meaning no additional IO is required throughout this test.

Unsurprisingly, in scenarios 2 and 3, changing the disk performance (adding 7ms of latency and limiting iops to either 600 or 150) affected throughput minimally (69,579.5 and 70,252 Operations per second respectively).

Our 99% response latencies for all three tests are between 0.40 and 0.44 ms.

Working Set Larger than WiredTiger Cache, but Still Fits in Filesystem Cache

Modern operating systems cache frequently accessed files to improve read performance. Because the file is already in memory, accessing cached files does not result in physical reads. The cached statistics displayed by the free Linux command details the size of the filesystem cache.

When we increase our record count from 2 million to 3 million we increase our total size of data and indexes to 3.66GB, 38% greater than can be serviced solely from the WiredTiger cache.

The metrics are clear that we are reading an average of 548 mbps into the WiredTiger cache, but we can observe a 99.9% hit rate when checking the file system cache metrics.

For this test we begin to see a reduction in performance, performing only 66,720 operations per second compared to our baseline, representing an 8% reduction compared to our previous test serviced solely from the WiredTiger cache.

As expected, reduced disk performance for this case does not significantly affect our overall throughput (64,484 and 64,229 operations respectively). In cases where the documents are more compressible, or the CPU is a limiting factor, the penalty reading from the filesystem cache would be more pronounced.

We note a 54% increase in observed p99 latency to .53 - .55ms.

Working Set Slightly Larger Than WiredTiger and FileSystem Cache

We have established the WiredTiger and file system caches work together to provide data to service our queries. However, when we grow our record count from 3 million to 4 million, we can no longer solely utilize these caches to service queries. Our data size grows to 4.8GB or 82% larger then our WiredTiger cache.

Here, we read into the WiredTiger cache at a rate of 257.4 mbps. Our filesystem cache hit rate lowers to 93-96%, meaning 4-7% of our reads result in physical reads from disk.

Varying the available IOPS and disk latency has a huge impact on performance for this test.

The 99th percentile response latencies further increase. Scenario 1: 19ms, scenario 2: 171ms, and scenario 3: 770ms an increase of 43x, 389x, and 1751x from the in cache case.

We see 75% lower performance when MongoDB is provided the full 5000 iops compared to our earlier test, which fit fully in cache. Scenarios 2 and Scenario 3 achieved 5139.5 and 737.95 Operations per second respectively, further demonstrating the IO bottleneck.

Working Set Much Larger Than WiredTiger and FileSystem Cache

Moving up to 5 million records, we grow our data and index size to 6.09GB, larger than our combined WiredTiger and file system caches. We see our throughput dip below our IOPS. In this case we are still servicing 81% of of WiredTiger reads from the file system cache, but the reads overflowing from disk are saturating our IO. We see 71, 8.3, and 1.9 Mbps read into the filesystem cache for this test.

The 99th percentile response latencies further increase. Scenario 1: 22ms, Scenario 2: 199ms, and Senario 3: 810ms, an increase of 52x, 454x, and 1841x from the in cache response latencies. Here, changing the disk IOPS significantly affects our throughput.

Summary

Through this series of tests we demonstrate two major points.

If the working set fits in cache, disk performance does not greatly affect application performance.
When the working set exceeds available memory, disk performance quickly becomes the limiting factor for throughput.

Understanding how MongoDB utilizes both memory and disks is an important part for both sizing a deployment and understanding performance. The inner workings of the WiredTiger storage engine attempts to use hardware to the fullest extent, but memory and disk are two critical pieces of infrastructure contributing to the overall performance characteristics of your workload.

← Previous

Introducing the Best Database for Modern Applications

The announcements we made today at MongoDB World 2018 represent a significant milestone in the evolution of MongoDB, making it the database of choice for all modern applications. Broadly speaking, there are three reasons for this: The document data model – presenting you the best way to work with data . It’s distributed by design – allowing you to intelligently put data where you want it . A unified experience that gives you the freedom to run anywhere – allowing you to future-proof your work and eliminate vendor lock-in. There is a ton of new stuff, and so I wanted to give you a summary of what I covered during my keynote, with links to key resources so you can learn more. Best Way to Work with Data Today we released MongoDB Server 4.0 for General Availability. The highlight of the release is multi-document ACID transactions , which we previewed back in February with a beta program that attracted thousands of members of the community, putting transactions through their paces and providing invaluable feedback to the engineering team. We’ve implemented transactions so they feel just like the transactions you are familiar with from relational databases. They enforce snapshot isolation to provide a consistent view of data, and all-or-nothing execution to maintain data integrity. And while the document model means multi-document transactions aren’t necessary for most operations, with them it’s even easier for you to address a complete range of use cases with MongoDB. It’s no secret how much I love MongoDB’s aggregation framework. Building queries stage-by-stage, checking your output as you go is by far a better way to write your most complicated queries than dealing with a monolithic snarl of SQL. To make that workflow even better, we’ve enhanced MongoDB Compass with the aggregation pipeline builder , which provides stage-by-stage, real-time feedback on the documents flowing through your pipelines. It’s easier than ever to deploy sophisticated processing pipelines that transform, aggregate, and analyze your data, all from the simple and intuitive MongoDB Compass GUI. You can then export the pipelines, and any other queries you create in Compass, to the native code of your preferred programming language. Server 4.0 also adds type conversions to the aggregation pipeline. With the new $convert operator you can transform mixed data types into standardized, cleansed formats natively within the database , preparing it for BI and machine learning, while eliminating costly, slow, and fragile ETL processes. Extending the tools you can use to work with data managed by the server, we announced the public beta of MongoDB Charts , which provides the fastest and easiest way to get insights into your operational data, in real time. With Charts, you can create and share visualisations of your MongoDB data, using a document-native interface, without needing to move it into other systems or leverage third-party tools. Documents and MongoDB’s query language are the best way to work with data, and to bring that power out of the datacenter and into the hands of app developers, MongoDB Stitch , which is GA as of today, provides two of its four services: QueryAnywhere and Functions. Using the authentication and declarative access control rules of Stitch QueryAnywhere , we can end the horrid practice of implementing shadow query languages in REST on top of application servers that just turn those REST calls into real query languages. With a native SDK, developers can make use of the full power of MongoDB from mobile and JavaScript applications, while Stitch makes sure that the right permissions are observed. Stitch Functions , JavaScript functions that execute with full access to application context, let developers compose their business logic with access to Atlas and calls to external services. With these two services, it’s easy to build complete applications without standing up a single application server. Intelligently Put Data Where you Want It As a distributed system, MongoDB enables you to spread data out across a cluster of nodes for resilience, scalability, and workload isolation. Unlike other distributed databases that randomly spray data around a cluster, MongoDB allows you to define controls that place data on specific nodes, for example in a specific region for low latency reads and writes, and for compliance with new privacy regulations. The new Global Clusters introduced to MongoDB Atlas allow you to deploy a geographically distributed, fully managed database that provides low latency writes and reads to users anywhere, with data placement controls for regulatory compliance. We also announced Atlas Enterprise, offering new security controls including LDAP integration, the encrypted storage engine with bring-your-own key management, and database-level auditing. Organizations can now also use databases managed by MongoDB Atlas to build HIPAA-compliant applications under an executed Business Associate Agreement (BAA) with MongoDB, Inc. With these now announcements, MongoDB Atlas is the most secure cloud database service available anywhere. Coding in a distributed world also means that the traditional means of responding to events in a database are no longer viable. So Stitch Triggers , also GA today, makes it possible by building on the Change Streams introduced in MongoDB 3.6. When you create a trigger, Stitch manages a change stream on your behalf, providing real-time notifications to Stitch Functions, which can react in all the ways functions can, from updating analytics rollup collections, to sending email or text messages, or kicking off other external services like Kafka or Kinesis. Freedom to Run Anywhere Whether you want to consume your database as a service in MongoDB Atlas , or manage it yourself on your own infrastructure, the announcements today make that even easier. We deliver a data platform that runs the same everywhere, that leverages the benefits of multi-cloud strategy with no lock-in, and is available in 50+ regions across the major cloud providers. If you want to run MongoDB yourself, then we have released our new free MongoDB monitoring cloud service . The service is available to all MongoDB users, without needing to install an agent, navigate a paywall, or complete a registration form. You will be able to see the metrics and topology about your environment from the moment free monitoring is enabled. You can enable free monitoring easily using the MongoDB shell, MongoDB Compass, or by starting the mongod process with the new db.enableFreeMonitoring() command line option, and you can opt out at any time. We’re seeing more DevOps teams leveraging the power of containerization and technologies like Kubernetes and Red Hat OpenShift to manage containerized clusters. Today we announced beta of the new MongoDB Enterprise Operator for Kubernetes, enabling you to deploy and manage MongoDB clusters from within the Kubernetes API, without having to connect separately to Ops Manager . You can learn more by reading our Red Hat OpenShift and MongoDB blog , and checking out the repository on GitHub . What’s Next? So as you can see, that’s a ton of stuff. The announcements today represent our biggest set of releases yet, and we’re incredibly excited to get it into your hands and see what amazing things you do with them. Head over to our MongoDB World 2018 announcements page for more resources on each of these new products and services. Please note : This article previously discussed MongoDB Mobile/Sync. Those products are currently being deprecated as we work towards a public beta of MongoDB Realm. To learn more about this, see the MongoDB Realm site .

June 27, 2018

Next →

The Journey of MongoDB with COVESA in the Connected Vehicle Landscape

There’s a popular saying: “If you want to go fast, go alone; if you want to go far, go together.” I would argue The Connected Vehicle Systems Alliance (COVESA) in partnership with their extensive member network, turns this saying on its head. They have found a way to go fast, together and also go far, together. COVESA is an industry alliance focused on enabling the widespread adoption of connected vehicle systems. This group aims to accelerate the development of these technologies through collaboration and standardization. It's made up of various stakeholders in the automotive and technology sectors, including car manufacturers, suppliers, and tech companies. COVESA’s collaborative approach allows members to accelerate progress. Shared solutions eliminate the need for individual members to reinvent the wheel. This frees up their resources to tackle new challenges, as the community collectively builds, tests, and refines foundational components. As vehicles become more connected, the data they generate explodes in volume, variety, and velocity. Cars are no longer just a mode of transportation, but a platform for advanced technology and data-driven services. This is where MongoDB steps in. MongoDB.local NYC Join us in person on May 2, 2024 for our keynote address, announcements, and technical sessions to help you build and deploy mission-critical applications at scale. Use Code Web50 for 50% off your ticket! Learn More MongoDB and COVESA As the database trusted for mission-critical systems by enterprises such as Cathay Pacific , Volvo Connect , or Cox Automotive ; MongoDB has gained expertise in automotive, along with many other industries, building cross-industry knowledge in handling large-scale, diverse data sets. This in turn enables us to contribute significantly to vehicle applications and provide a unique view, especially in the data architecture discussions within COVESA. MongoDB solutions support these kinds of innovations, enabling automotive companies to leverage data for advanced features. One of the main features we provide is Atlas Device SDKs : a low-footprint, embedded database directly living on ECUs. It can synchronize data automatically with the cloud using Atlas Device Sync , our data transfer protocol that compresses the data handles conflict resolution, and only syncs delta changes, making it extremely efficient in terms of operations and maintenance. VSS: The backbone of connected vehicle data An important area of COVESA’s work is the Vehicle Signal Specification (VSS). VSS is a standardized framework used to describe data of a vehicle, such as speed, location, and diagnostic information. This standardization is essential for interoperability between different systems and components within a vehicle, as well as for external communication with other vehicles and infrastructure. VSS has been gaining more and more adoption, and it’s backed by ongoing contributions from BMW, Volvo Cars, Jaguar LR, Robert Bosch and Geotab, among others. MongoDB’s BSON and our Object-oriented Device SDKs uniquely position us to contribute to VSS implementation. The VSS data structured maps 1 to 1 to documents in MongoDB and objects in Atlas Device SDKs , which simplifies development, and speeds up applications by completely skipping any Relational Mapper layer. For every read or write, there is no need to transform the data between relational and VSS. Our insights into data structuring, querying, and management can help optimize the way data is stored and accessed in connected vehicles, making it more efficient and robust. Where MongoDB contributes MongoDB, within COVESA, finds its most meaningful contributions in areas where data complexities and community collaboration intersect. First, we can share insights into managing vast and varied data emerging from connected vehicles generating data on everything from engine performance to driver behavior. Second, we have an important role in supporting the standardization efforts, crucial for ensuring different systems within vehicles can communicate seamlessly. Our inputs can help ensure these standards are robust and practical, considering the real-world scenarios of data usage in vehicles. Some of our contributions include an Over the Air update architectural review presented at Troy COVESA’s AMM in October 2023; sharing insights about the Data Middleware PoC with BMW; and weekly contributions at the Data Expert Group. You can find some of our contributions on COVESA’s Wiki page . In essence, MongoDB's role in COVESA is about providing a unique perspective from the database management point of view, offering our understanding from different industries and use cases to support the developments towards more connected and intelligent vehicles. MongoDB, COVESA, and AWS together at CES2024 MongoDB’s most recent collaboration with COVESA was at the Consumer Electronics Show CES 2024 during which MongoDB’s Connected Vehicle solution was showcased. This solution leverages Atlas Device SDKs, such as the SDK for C++ , which enables local data storage, in-vehicle data synchronization, and also uni and bi-directional data transfer with the cloud. Below is a schematic illustrating the integration of MongoDB within the software-defined vehicle: Schema 1: End to end integration for the connected vehicle At CES 2024, MongoDB also teamed up with AWS for a compelling presentation, " AI-powered Connected Vehicles with MongoDB and AWS " led by Dr. Humza Akhtar and Mohan Yellapantula, Head of Automotive Solutions & Go To Market at AWS. The session delved into the intricacies of building connected vehicle user experiences using MongoDB Atlas. It showcased the combined strengths of MongoDB's expertise and AWS's generative AI tools, emphasizing how Atlas Vector Search unlocks the full lifecycle value of connected vehicle data. During the event, MongoDB also engaged in a conversation with The Six Five, exploring various aspects of mobility, self-driving vehicles (SDVs), and the MongoDB and AWS partnership. This discussion extended to merging IT and OT, GenAI, Atlas Edger Server, and Atlas Device SDK. Going forward At the end of the road, it’s all about enhancing the end-user experience and providing unique value propositions. Defect diagnosis based on the acoustics of the engine, improved crash assistance with mobile and vehicle telemetry data, just-in-time food ordering while on the road, in-vehicle payments, and much, much more. What all these experiences have in common is the combination of interconnected data from different systems. At MongoDB, we are laser-focused on empowering OEMs to create, transform, and disrupt the automotive industry by unleashing the power of software and data. We enable this by: Partnering with alliances such as COVESA to build a strong ecosystem or collaboration. Having one single API for In-vehicle Data Storage, Edge to Cloud Synchronization, Time Series storage, and more, improves the developer experience. Focusing on having a robust, scalable, and secure suite of services trusted by tens of thousands of customers in more than 100 countries. Together with COVESA’s vision for connected vehicles, we’re driving a future where this industry is safer, more efficient, and seamlessly integrated into the digital world. The journey is just beginning. To learn more about MongoDB-connected mobility solutions, visit the MongoDB for Manufacturing & Motion webpage . Achieving fast, reliable and compressed data exchange is one of the pillars of Software Defined Vehicles, learn how MongoDB Atlas and Edge Server can help in this short demo .

April 15, 2024