APEX: A High-Performance Learned Index on Persistent Memory

The recently released persistent memory (PM) has been gaining popularity. PM offers high performance (still lower than DRAM), persistence, and is cheaper than DRAM. This opens up new possibilities for index design. Existing indexes on PM typically evolve traditional B+Tree indexes. However, this is not the best choice. Recently proposed learned indexes use machine learning (ML) for data distribution-aware indexing and perform significantly better than B+Trees, but none support persistence and instant recovery. In this paper, we propose a new learned index on PM namely APEX, with very high performance, persistence, concurrency, and instant recovery. Our very careful design combines the best of both worlds. Specifically, APEX is designed to reduce PM accesses, especially the heavy-weight writes, flush, and memory fence instructions, while still exploiting ML, to achieve high performance. Our in-depth experimental evaluation on Intel DCPMM shows that APEX achieves up to ∼15× higher throughput as compared to the state-of-the-art PM indexes, and can recover from failures in ∼42ms. To the best of our knowledge, APEX is the first full-fledged and practical learned index on PM.

[1]  Long Yang,et al.  LISA: A Learned Index Structure for Spatial Data , 2020, SIGMOD Conference.

[2]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[3]  Lin Ma,et al.  Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes , 2016, SIGMOD Conference.

[4]  Shimin Chen,et al.  ROART: Range-query Optimized Persistent ART , 2021, FAST.

[5]  Eva Zangerle,et al.  HOT: A Height Optimized Trie Index for Main-Memory Database Systems , 2018, SIGMOD Conference.

[6]  Badrish Chandramouli,et al.  ALEX: An Updatable Adaptive Learned Index , 2019, SIGMOD Conference.

[7]  Badrish Chandramouli,et al.  Qd-tree: Learning Data Layouts for Big Data Analytics , 2020, SIGMOD Conference.

[8]  Tim Kraska,et al.  RadixSpline: a single-pass learned index , 2020, aiDM@SIGMOD.

[9]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[10]  Carsten Binnig,et al.  FITing-Tree: A Data-aware Index Structure , 2018, SIGMOD Conference.

[11]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[13]  Tim Kraska,et al.  Bounding the Last Mile: Efficient Learned String Indexing , 2021, ArXiv.

[14]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[15]  Ismail Oukid,et al.  Evaluating Persistent Memory Range Indexes , 2019, Proc. VLDB Endow..

[16]  Youjip Won,et al.  Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree , 2018, FAST.

[17]  Shimin Chen,et al.  How Does Updatable Learned Index Perform on Non-Volatile Main Memory? , 2021, 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW).

[18]  Chunxiao Xing,et al.  Updatable Learned Index with Precise Positions , 2021, Proc. VLDB Endow..

[19]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[20]  M. Breitwisch Phase Change Memory , 2008, 2008 International Interconnect Technology Conference.

[21]  Haibo Chen,et al.  SIndex: a scalable learned index for string keys , 2020, APSys.

[22]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[23]  Sam H. Noh,et al.  Write-Optimized Dynamic Hashing for Persistent Memory , 2019, FAST.

[24]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[25]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[26]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[27]  Sam H. Noh,et al.  WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems , 2017, FAST.

[28]  Taesoo Kim,et al.  Recipe: converting concurrent DRAM indexes to persistent-memory indexes , 2019, SOSP.

[29]  Per-Åke Larson,et al.  Easy Lock-Free Indexing in Non-Volatile Memory , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[30]  Per-Åke Larson,et al.  BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory , 2018, Proc. VLDB Endow..

[31]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[32]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[33]  Haibo Chen,et al.  XIndex: a scalable learned index for multicore data storage , 2020, PPoPP.

[34]  Jie Wu,et al.  Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory , 2018, OSDI.

[35]  Evica Milchevski,et al.  The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries , 2020, EDBT.

[36]  S. Hewitt,et al.  2008 , 2018, Los 25 años de la OMC: Una retrospectiva fotográfica.

[37]  Lidan Shou,et al.  DPTree: Differential Indexing for Persistent Memory , 2019, Proc. VLDB Endow..

[38]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[39]  Badrish Chandramouli,et al.  Patience is a virtue: revisiting merge and sort on modern processors , 2014, SIGMOD Conference.

[40]  Christopher Ré,et al.  ML-In-Databases: Assessment and Prognosis , 2021, IEEE Data Eng. Bull..

[41]  Tim Kraska,et al.  Benchmarking learned indexes , 2020, VLDB 2020.

[42]  Tim Kraska,et al.  Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads , 2020, Proc. VLDB Endow..

[43]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[44]  Steven Swanson,et al.  An Empirical Guide to the Behavior and Use of Scalable Persistent Memory , 2019, FAST.

[45]  Christian S. Jensen,et al.  Effectively learning spatial indices , 2020, Proc. VLDB Endow..

[46]  Viktor Leis,et al.  Persistent Memory I/O Primitives , 2019, DaMoN.

[47]  Jianliang Xu,et al.  Learned Index for Spatial Queries , 2019, 2019 20th IEEE International Conference on Mobile Data Management (MDM).

[48]  Youyou Lu,et al.  ?Tree: a Persistent B+-Tree with Low Tail Latency , 2020, Proc. VLDB Endow..

[49]  Tim Kraska,et al.  Learning Multi-Dimensional Indexes , 2019, SIGMOD Conference.

[50]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[51]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[52]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[53]  Viktor Leis,et al.  The ART of practical synchronization , 2016, DaMoN '16.