A Top-Down Approach to Achieving Performance Predictability in Database Systems

While much of the research on transaction processing has focused on improving overall performance in terms of throughput and mean latency, surprisingly less attention has been given to performance predictability: how often individual transactions exhibit execution latency far from the mean. Performance predictability is increasingly important when transactions lie on the critical path of latency-sensitive applications, enterprise software, or interactive web services. In this paper, we focus on understanding and mitigating the sources of performance unpredictability in today's transactional databases. We conduct the first quantitative study of major sources of variance in MySQL, Postgres (two of the largest and most popular open-source products on the market), and VoltDB (a non-conventional database). We carry out our study with a tool called TProfiler that, given the source code of a database system and programmer annotations indicating the start and end of a transaction, is able to identify the dominant sources of variance in transaction latency. Based on our findings, we investigate alternative algorithms, implementations, and tuning strategies to reduce latency variance without compromising mean latency or throughput. Most notably, we propose a new lock scheduling algorithm, called Variance-Aware Transaction Scheduling (VATS), and a lazy buffer pool replacement policy. In particular, our modified MySQL exhibits significantly lower variance and 99th percentile latencies by up to 5.6× and 6.3×, respectively. Our proposal has been welcomed by the open-source community, and our VATS algorithm has already been adopted as of MySQL's 5.7.17 release (and been made the default scheduling policy in MariaDB).

[1]  Young-Kuk Kim,et al.  Supporting predictability in real-time database systems , 1996, Proceedings Real-Time Technology and Applications.

[2]  Mohit Singh,et al.  Sharing Buffer Pool Memory in Multi-Tenant Relational Database-as-a-Service , 2015, Proc. VLDB Endow..

[3]  Barton P. Miller,et al.  Incremental call‐path profiling , 2007, Concurr. Comput. Pract. Exp..

[4]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[5]  Joseph Y. Halpern,et al.  Least expected cost query optimization: an exercise in utility , 1999, PODS.

[6]  Gustavo Alonso,et al.  Predictable Performance for Unpredictable Workloads , 2009, Proc. VLDB Endow..

[7]  Marie-Anne Neimat,et al.  Oracle TimesTen: An In-Memory Database for Enterprise Applications , 2013, IEEE Data Eng. Bull..

[8]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[9]  Qiong Luo,et al.  PTL: Partitioned Logging for Database Storage on Flash Solid State Drives , 2012, WAIM Workshops.

[10]  Frederick Reiss,et al.  Main-memory scan sharing for multi-core CPUs , 2008, Proc. VLDB Endow..

[11]  Wieslaw Kubiak,et al.  New Results on the Completion Time Variance Minimization , 1995, Discret. Appl. Math..

[12]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[13]  Badrish Chandramouli,et al.  A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service , 2013, SIGMOD '13.

[14]  Subramanya Dulloor,et al.  Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems , 2015, SIGMOD Conference.

[15]  Surajit Chaudhuri,et al.  Variance aware optimization of parameterized queries , 2010, SIGMOD Conference.

[16]  Michael Stonebraker,et al.  Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores , 2014, Proc. VLDB Endow..

[17]  Calton Pu,et al.  A Two-Phase Approach to Predictably Scheduling Real-Time Transactions , 1996, Performance of Concurrency Control Mechanisms in Centralized Database Systems.

[18]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[19]  Tim Kraska,et al.  PIQL: Success-Tolerant Query Processing in the Cloud , 2011, Proc. VLDB Endow..

[20]  Felix Wolf,et al.  Space-efficient time-series call-path profiling of parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Thomas F. Wenisch,et al.  Storage Management in the NVRAM Era , 2013, Proc. VLDB Endow..

[22]  Abhishek Kumar,et al.  Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems , 2008, OSDI.

[23]  Samuel Eilon,et al.  Minimising Waiting Time Variance in the Single Machine Problem , 1977 .

[24]  Eugene Zhen Ye Goh,et al.  CliffGuard : An Extended Report ∗ , 2015 .

[25]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[26]  Carlo Curino,et al.  DBSeer: Resource and Performance Prediction for Building a Next Generation Database Cloud , 2013, CIDR.

[27]  George Candea,et al.  A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses , 2009, Proc. VLDB Endow..

[28]  J. Michael Spivey,et al.  Fast, accurate call graph profiling , 2004, Softw. Pract. Exp..

[29]  Wieslaw Kubiak,et al.  Fast fully polynomial approximation schemes for minimizing completion time variance , 2002, Eur. J. Oper. Res..

[30]  Paolo Avesani,et al.  Controversial Users Demand Local Trust Metrics: An Experimental Study on Epinions.com Community , 2005, AAAI.

[31]  Thomas F. Wenisch,et al.  Statistical Analysis of Latency Through Semantic Profiling , 2017, EuroSys.

[32]  C. R. Bector,et al.  V-shape property of optimal sequence of jobs about a common due date on a single machine , 1989, Comput. Oper. Res..

[33]  Ryan Johnson,et al.  Scalable Logging through Emerging Non-Volatile Memory , 2014, Proc. VLDB Endow..

[34]  Jia-Chi Tsou,et al.  Sequencing heuristic for bicriteria scheduling in a single machine problem , 2006 .

[35]  Shimin Chen,et al.  FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.

[36]  Barzan Mozafari,et al.  DBSherlock: A Performance Diagnostic Tool for Transactional Databases , 2016, SIGMOD Conference.

[37]  Wieslaw Kubiak,et al.  Completion time variance minimization on a single machine is difficult , 1993, Oper. Res. Lett..

[38]  Barzan Mozafari,et al.  DBSeer: Pain-free Database Administration through Workload Intelligence , 2015, Proc. VLDB Endow..

[39]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[40]  Gustavo Alonso,et al.  SharedDB: Killing One Thousand Queries With One Stone , 2012, Proc. VLDB Endow..

[41]  Prabuddha De,et al.  On the Minimization of Completion Time Variance with a Bicriteria Extension , 1992, Oper. Res..

[42]  David J. DeWitt,et al.  Recovery architectures for multiprocessor database machines , 1985, SIGMOD Conference.

[43]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[44]  Tim Kraska,et al.  Generalized scale independence through incremental precomputation , 2013, SIGMOD '13.

[45]  Subramanian Arumugam,et al.  The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[46]  Kenneth A. Ross,et al.  Making Updates Disk-I/O Friendly Using SSDs , 2013, Proc. VLDB Endow..

[47]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[48]  Robert J. Hall,et al.  Call path profiling , 1992, International Conference on Software Engineering.

[49]  Daniela Florescu,et al.  Rethinking cost and performance of database systems , 2009, SGMD.

[50]  Carlo Curino,et al.  Performance and resource modeling in highly-concurrent OLTP workloads , 2013, SIGMOD '13.

[51]  Rakesh Agrawal A Parallel Logging Algorithm for Multiprocessor Database Machine , 1985, IWDM.

[52]  Peter Bailis,et al.  Coordination Avoidance in Distributed Databases , 2015 .

[53]  Barzan Mozafari,et al.  CliffGuard: A Principled Framework for Finding Robust Database Designs , 2015, SIGMOD Conference.

[54]  Nong Ye,et al.  Job Scheduling Methods for Reducing Waiting Time Variance , 2022 .

[55]  Hector Garcia-Molina,et al.  Scheduling real-time transactions: a performance evaluation , 1988, TODS.

[56]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Andreas Reuter,et al.  Group Commit Timers and High Volume Transaction Systems , 1987, HPTS.