论文信息 - From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees

From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees

We introduce BOURBON, a log-structured merge (LSM) tree that utilizes machine learning to provide fast lookups. We base the design and implementation of BOURBON on empirically-grounded principles that we derive through careful analysis of LSM design. BOURBON employs greedy piecewise linear regression to learn key distributions, enabling fast lookup with minimal computation, and applies a cost-benefit strategy to decide when learning will be worthwhile. Through a series of experiments on both synthetic and real-world datasets, we show that BOURBON improves lookup performance by 1.23x-1.78x as compared to state-of-the-art production LSMs.

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[3] Tim Kraska,et al. RadixSpline: a single-pass learned index , 2020, aiDM@SIGMOD.

[4] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5] Marina Papatriantafilou,et al. Piecewise Linear Approximation in Data Streaming: Algorithmic Implementations and Experimental Analysis , 2018, ArXiv.

[6] Qing Xie,et al. Maximum error-bounded Piecewise Linear Representation for online stream approximation , 2014, The VLDB Journal.

[7] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.

[8] Patrick E. O'Neil,et al. The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[9] Michael A. Bender,et al. An Introduction to Bε-trees and Write-Optimization , 2015, login Usenix Mag..

[10] Haibo Chen,et al. XIndex: a scalable learned index for multicore data storage , 2020, PPoPP.

[11] Carsten Binnig,et al. FITing-Tree: A Data-aware Index Structure , 2018, SIGMOD Conference.

[12] Lars George,et al. HBase - The Definitive Guide: Random Access to Your Planet-Size Data , 2011 .

[13] Manos Athanassoulis,et al. Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[14] Pengfei Zuo,et al. A Scalable Learned Index Scheme in Storage Systems , 2019, ArXiv.

[15] Eric Eide,et al. Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..

[16] Idit Keidar,et al. Scaling concurrent log-structured data stores , 2015, EuroSys.

[17] Tim Kraska,et al. The Case for Learned Index Structures , 2018 .

[18] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] Badrish Chandramouli,et al. ALEX: An Updatable Adaptive Learned Index , 2019, SIGMOD Conference.

[20] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[21] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[22] Jerry Li,et al. Fast Algorithms for Segmented Regression , 2016, ICML.

[23] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[24] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25] Radu Horaud,et al. A Comprehensive Analysis of Deep Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Eamonn J. Keogh,et al. An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27] Tim Kraska,et al. SageDB: A Learned Database System , 2019, CIDR.

[28] Christopher Leckie,et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[29] Anna R. Karlin,et al. Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[30] Douglas Comer,et al. Ubiquitous B-Tree , 1979, CSUR.

[31] Raghu Ramakrishnan,et al. bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[32] Andrea C. Arpaci-Dusseau,et al. WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[33] Qiang Wang,et al. Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[34] Tim Kraska,et al. SOSD: A Benchmark for Learned Indexes , 2019, ArXiv.

[35] Ittai Abraham,et al. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees , 2017, SOSP.

[36] Andre Esteva,et al. A guide to deep learning in healthcare , 2019, Nature Medicine.

[37] Paolo Ferragina,et al. The PGM-index , 2019, Proc. VLDB Endow..

[38] Stratos Idreos,et al. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging , 2018, SIGMOD Conference.