Updatable Learned Indexes Meet Disk-Resident DBMS - From Evaluations to Design Choices

Although many updatable learned indexes have been proposed in recent years, whether they can outperform traditional approaches on disk remains unknown. In this study, we revisit and implement four state-of-the-art updatable learned indexes on disk, and compare them against the B+-tree under a wide range of settings. Through our evaluation, we make some key observations: 1) Overall, the B+-tree performs well across a range of workload types and datasets. 2) A learned index could outperform B+-tree or other learned indexes on disk for a specific workload. For example, PGM achieves the best performance in write-only workloads while LIPP significantly outperforms others in lookup-only workloads. We further conduct a detailed performance analysis to reveal the strengths and weaknesses of these learned indexes on disk. Moreover, we summarize the observed common shortcomings in five categories and propose four design principles to guide future design of on-disk, updatable learned indexes: (1) reducing the index's tree height, (2) better data structures to lower operation overheads, (3) improving the efficiency of scan operations, and (4) more efficient storage layout.

[1]  Eric Lo,et al.  Are Updatable Learned Indexes Ready? , 2022, Proc. VLDB Endow..

[2]  Tei-Wei Kuo,et al.  NFL: Robust Learned Index via Distribution Transformation , 2022, Proc. VLDB Endow..

[3]  Tim Kraska,et al.  Bounding the Last Mile: Efficient Learned String Indexing , 2021, ArXiv.

[4]  Rekha Singhal,et al.  RUSLI: Real-time Updatable Spline Learned Index , 2021, aiDM@SIGMOD.

[5]  Shimin Chen,et al.  Updatable Learned Index with Precise Positions , 2021, Proc. VLDB Endow..

[6]  Yihan Gao,et al.  CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm , 2021, Proc. VLDB Endow..

[7]  Tim Kraska,et al.  Learned Indexes for a Google-scale Disk-based Database , 2020, ArXiv.

[8]  Haibo Chen,et al.  SIndex: a scalable learned index for string keys , 2020, APSys.

[9]  Christian S. Jensen,et al.  Effectively learning spatial indices , 2020, Proc. VLDB Endow..

[10]  Alexander van Renen,et al.  Benchmarking learned indexes , 2020, Proc. VLDB Endow..

[11]  Tim Kraska,et al.  Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads , 2020, Proc. VLDB Endow..

[12]  Long Yang,et al.  LISA: A Learned Index Structure for Spatial Data , 2020, SIGMOD Conference.

[13]  Andrea C. Arpaci-Dusseau,et al.  From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees , 2020, OSDI.

[14]  Tim Kraska,et al.  RadixSpline: a single-pass learned index , 2020, aiDM@SIGMOD.

[15]  Haibo Chen,et al.  XIndex: a scalable learned index for multicore data storage , 2020, PPoPP.

[16]  Dan Alistarh,et al.  Non-blocking interpolation search trees with doubly-logarithmic running time , 2020, PPoPP.

[17]  Tim Kraska,et al.  Learning Multi-Dimensional Indexes , 2019, SIGMOD Conference.

[18]  Badrish Chandramouli,et al.  ALEX: An Updatable Adaptive Learned Index , 2019, SIGMOD Conference.

[19]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[20]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[21]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[22]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[23]  Kurt Mehlhorn,et al.  Dynamic interpolation search , 1985, JACM.

[24]  Joseph O'Rourke,et al.  An on-line algorithm for fitting straight lines between data ranges , 1981, CACM.

[25]  Yu Hua,et al.  FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems , 2021, Proc. VLDB Endow..

[26]  Christopher Ré,et al.  ML-In-Databases: Assessment and Prognosis , 2021, IEEE Data Eng. Bull..