Enabling low tail latency on multicore key-value stores

Modern applications employ key-value stores (KVS) in at least some point of their software stack, often as a caching system or a storage manager. Many of these applications also require a high degree of responsiveness and performance predictability. However, most KVS have similar design decisions which focus on improving throughput metrics, at times by sacrificing latency. While latency can be occasionally reduced by over provisioning hardware, this entails significant increase in costs. In this paper we present RStore, a KVS which focus on low tail latency as its primary goal, while also enabling efficient usage of hardware resources. To that aim, we argue in favor of techniques such as an asynchronous programming model, message-passing communication, and log-structured storage on modern hardware. Throughout the paper we discuss these and other design decisions of RStore that differ from those of more traditional systems. Our evaluation shows that RStore scales its throughput with an increasing number of cores while maintaining a robust behavior with low and predictable latency.

[1]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[2]  Joe Armstrong,et al.  Making reliable distributed systems in the presence of software errors , 2003 .

[3]  Andrew Pavlo,et al.  Write-Behind Logging , 2016, Proc. VLDB Endow..

[4]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[5]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[6]  Babak Falsafi,et al.  Asynchronous Memory Access Chaining , 2015, Proc. VLDB Endow..

[7]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[8]  Jim Gray,et al.  Fault Tolerance in Tandem Computer Systems , 1987 .

[9]  Anastasia Ailamaki,et al.  Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Abraham Silberschatz,et al.  Operating System Concepts Essentials , 2010 .

[11]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  Badrish Chandramouli,et al.  FASTER: A Concurrent Key-Value Store with In-Place Updates , 2018, SIGMOD Conference.

[13]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[14]  Philip A. Bernstein,et al.  Orleans: Distributed Virtual Actors for Programmability and Scalability , 2014 .

[15]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[16]  David B. Lomet The Case for Log Structuring in Database Systems , 1995 .

[17]  Hideaki Kimura,et al.  FOEDUS: OLTP Engine for a Thousand Cores and NVRAM , 2015, SIGMOD Conference.

[18]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[19]  Thomas F. Wenisch,et al.  Storage Management in the NVRAM Era , 2013, Proc. VLDB Endow..

[20]  David Maier,et al.  Indexing in an Actor-Oriented Database , 2017, CIDR.

[21]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[22]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[23]  Sam H. Noh,et al.  WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems , 2017, FAST.

[24]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[25]  Hyeontaek Lim,et al.  Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs , 2016, FAST.

[26]  Manos Athanassoulis,et al.  Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[27]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[28]  Anastasia Ailamaki,et al.  Micro-architectural Analysis of In-Memory OLTP , 2015 .

[29]  Ryan Johnson,et al.  Scalable Logging through Emerging Non-Volatile Memory , 2014, Proc. VLDB Endow..

[30]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[31]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[32]  Goetz Graefe,et al.  B-tree indexes and CPU caches , 2001, Proceedings 17th International Conference on Data Engineering.

[33]  Sudipta Sengupta,et al.  LLAMA: A Cache/Storage Subsystem for Modern Hardware , 2013, Proc. VLDB Endow..

[34]  Sachin Katti,et al.  Reducing DRAM footprint with NVM in Facebook , 2018, EuroSys.

[35]  Brendan Gregg,et al.  Systems Performance: Enterprise and the Cloud , 2013 .

[36]  Norman May,et al.  Interleaving with Coroutines: A Practical Approach for Robust Index Joins , 2017, Proc. VLDB Endow..

[37]  Harumi A. Kuno,et al.  Definition, Detection, and Recovery of Single-Page Failures, a Fourth Class of Database Failures , 2012, Proc. VLDB Endow..

[38]  Jiwu Shu,et al.  Log-Structured Non-Volatile Main Memory , 2017, USENIX Annual Technical Conference.

[39]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[40]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[41]  Gang Chen,et al.  LogBase: A Scalable Log-structured Database System in the Cloud , 2012, Proc. VLDB Endow..

[42]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[43]  Jin Xiong,et al.  HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems , 2017, USENIX Annual Technical Conference.

[44]  Stephen M. Rumble,et al.  Log-structured memory for DRAM-based storage , 2014, FAST.

[45]  Joseph M. Hellerstein,et al.  Anna: A KVS for Any Scale , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).