Key-Value Storage Engines

Key-value stores are everywhere. They power a diverse set of data-driven applications across both industry and science. Key-value stores are used as stand-alone NoSQL systems but they are also used as a part of more complex pipelines and systems such as machine learning and relational systems. In this tutorial, we survey the state-of-the-art approaches on how the core storage engine of a key-value store system is designed. We focus on several critical components of the engine, starting with the core data structures to lay out data across the memory hierarchy. We also discuss design issues related to caching, timestamps, concurrency control, updates, shifting workloads, as well as mixed workloads with both analytical and transactional characteristics. We cover designs that are read-optimized, write-optimized as well as hybrids. We draw examples from several state-of-the-art systems but we also put everything together in a general framework which allows us to model storage engine designs under a single unified model and reason about the expected behavior of diverse designs. In addition, we show that given the vast number of possible storage engine designs and their complexity, there is a need to be able to describe and communicate design decisions at a high level descriptive language and we present a first version of such a language. We then use that framework to present several open challenges in the field, especially in terms of supporting increasingly more diverse and dynamic applications in the era of data science and AI, including neural networks, graphs, and data versioning.

[1]  Lukasz Ziarek,et al.  Just-In-Time Data Structures , 2015, CIDR.

[2]  Eleni Petraki,et al.  Database cracking: fancy scan, not poor man's sort! , 2014, DaMoN '14.

[3]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[4]  Michael J. Franklin Caching and Memory Management in Client-Server Database Systems , 1993 .

[5]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[6]  Henrik Loeser,et al.  "One Size Fits All": An Idea Whose Time Has Come and Gone? , 2011, BTW.

[7]  Idit Keidar,et al.  Scaling concurrent log-structured data stores , 2015, EuroSys.

[8]  Themis Palpanas,et al.  Indexing for interactive exploration of big data series , 2014, SIGMOD Conference.

[9]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[10]  David Li,et al.  Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn , 2019, CIDR.

[11]  Themis Palpanas,et al.  Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm , 2019, SIGMOD Conference.

[12]  Stratos Idreos,et al.  Main Memory Adaptive Denormalization , 2016, SIGMOD Conference.

[13]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[14]  Christopher Ré,et al.  Brainwash: A Data System for Feature Engineering , 2013, CIDR.

[15]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[16]  Jin-Soo Kim,et al.  ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys , 2016, IEEE Transactions on Computers.

[17]  Philippe Bonnet,et al.  GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices , 2016, SIGMOD Conference.

[18]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[19]  Manos Athanassoulis,et al.  Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[20]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[21]  Stratos Idreos,et al.  Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging , 2018, SIGMOD Conference.

[22]  Feifei Li,et al.  LogKV: Exploiting Key-Value Stores for Log Processing , 2013, CIDR.

[23]  Stephen M. Rumble,et al.  Log-structured memory for DRAM-based storage , 2014, FAST.

[24]  Abdul Wasay,et al.  The Periodic Table of Data Structures , 2018, IEEE Data Eng. Bull..

[25]  Jens Dittrich,et al.  Main memory adaptive indexing for multi-core systems , 2014, DaMoN '14.

[26]  Andrew Pavlo,et al.  Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[27]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[28]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[29]  Eleni Petraki,et al.  Holistic Indexing in Main-memory Column-stores , 2015, SIGMOD Conference.

[30]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[31]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[32]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[33]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[34]  Michael J. Carey,et al.  Pregelix: Big(ger) Graph Analytics on a Dataflow Engine , 2014, Proc. VLDB Endow..

[35]  Kai Ren,et al.  SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data , 2017, Proc. VLDB Endow..

[36]  Manos Athanassoulis,et al.  Optimal Bloom Filters and Adaptive Merging for LSM-Trees , 2018, ACM Trans. Database Syst..

[37]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[38]  Herodotos Herodotou,et al.  Automated Experiment-Driven Management of (Database) Systems , 2009, HotOS.

[39]  Alekh Jindal,et al.  Towards a One Size Fits All Database Architecture , 2011, CIDR.

[40]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[41]  Stratos Idreos,et al.  Evolutionary Data Systems , 2017, ArXiv.

[42]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[43]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[44]  Alekh Jindal,et al.  The Uncracked Pieces in Database Cracking , 2013, Proc. VLDB Endow..

[45]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[46]  Stratos Idreos,et al.  The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models , 2018, SIGMOD Conference.

[47]  Viktor Leis,et al.  SuRF: Practical Range Query Filtering with Fast Succinct Tries , 2018, SIGMOD Conference.

[48]  Tony Savor,et al.  Optimizing Space Amplification in RocksDB , 2017, CIDR.

[49]  Ashish Motivala,et al.  The Snowflake Elastic Data Warehouse , 2016, SIGMOD Conference.

[50]  Harumi A. Kuno,et al.  Concurrency Control for Adaptive Indexing , 2012, Proc. VLDB Endow..

[51]  Stratos Idreos,et al.  The Log-Structured Merge-Bush & the Wacky Continuum , 2019, SIGMOD Conference.

[52]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[53]  Themis Palpanas,et al.  Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes , 2018, Proc. VLDB Endow..

[54]  Badrish Chandramouli,et al.  FASTER: A Concurrent Key-Value Store with In-Place Updates , 2018, SIGMOD Conference.

[55]  Volker Markl,et al.  Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation , 2015, SIGMOD Conference.