Performance Evaluation of MongoDB I/O access patterns

In the era of Big Data & Clouds distributed databases such as NoSQL databases are taking their places among the most used storage systems. Benchmarking could be used to evaluate NoSQL databases. However, most benchmarks such as YCSB focus on high-level metrics like the throughput for evaluating Cloud systems, including NoSQL databases. As a result, some low-level metrics, which give an idea about how the databases are efficient at performing their different operations and interacting with the operating system, could not be evaluated. For example, various internal behaviors such as how the data is accessed on disks are considered as black boxes since tools to analyze them are lacking. We focus on MongoDB and study its I/O system, as MongoDB has a good reputation and is located at the top of document-based NoSQL databases. Its flexible data model and its worthy integrated tools make it a very favorable choice for different kinds of applications. We designed generic tracing tools to study the performance of MongoDB’s I/O system and its behaviors inside the Linux I/O stack. In this talk, we will show, through experiments, the efficiency of our method, which could uncover the hidden reasons behind the performance issues. Our performance results show that MongoDB suffers from a reduced throughput problem when performing heavy operations, such as a secondary indexing, on a clustered MongoDB. The main cause behind that is the noisy, and at worst, the shapeless I/O access patterns. MongoDB sends its I/O requests by verifying the sequentiality of the data records in its index table, but not on the storage support where the data could be allocated by a different order. We give some insights and an ad hoc solution to overcome this performance issue.