Learning How To Learn Within An LSM-based Key-Value Store

We introduce BOURBON, a log-structured merge (LSM) tree that utilizes machine learning to provide fast lookups. We base the design and implementation of BOURBON on empirically grounded principles that we derive through careful analysis of LSM design. BOURBON employs greedy piecewise linear regression to learn key distributions, enabling fast lookup with minimal computation, and applies a cost-benefit strategy to decide when learning will be worthwhile. Through a series of experiments on both synthetic and real-world datasets, we show that BOURBON improves lookup performance by 1.23x-1.78x as compared to state-of-the-art production LSMs.

[1]  Marina Papatriantafilou,et al.  Piecewise Linear Approximation in Data Streaming: Algorithmic Implementations and Experimental Analysis , 2018, ArXiv.

[2]  Lars George,et al.  HBase - The Definitive Guide: Random Access to Your Planet-Size Data , 2011 .

[3]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[5]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[6]  Qing Xie,et al.  Maximum error-bounded Piecewise Linear Representation for online stream approximation , 2014, The VLDB Journal.

[7]  Tim Kraska,et al.  RadixSpline: a single-pass learned index , 2020, aiDM@SIGMOD.

[8]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[9]  Tim Kraska,et al.  SOSD: A Benchmark for Learned Indexes , 2019, ArXiv.

[10]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[11]  Carsten Binnig,et al.  FITing-Tree: A Data-aware Index Structure , 2018, SIGMOD Conference.

[12]  Jerry Li,et al.  Fast Algorithms for Segmented Regression , 2016, ICML.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[16]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[17]  Radu Horaud,et al.  A Comprehensive Analysis of Deep Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Idit Keidar,et al.  Scaling concurrent log-structured data stores , 2015, EuroSys.

[19]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[20]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[21]  Pengfei Zuo,et al.  A Scalable Learned Index Scheme in Storage Systems , 2019, ArXiv.

[22]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[23]  Manos Athanassoulis,et al.  Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[24]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[25]  Tim Kraska,et al.  SageDB: A Learned Database System , 2019, CIDR.

[26]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[27]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[28]  Stratos Idreos,et al.  Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging , 2018, SIGMOD Conference.

[29]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[30]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Ittai Abraham,et al.  PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees , 2017, SOSP.

[32]  Badrish Chandramouli,et al.  ALEX: An Updatable Adaptive Learned Index , 2019, SIGMOD Conference.

[33]  Haibo Chen,et al.  XIndex: a scalable learned index for multicore data storage , 2020, PPoPP.