Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree

With the emergence of byte-addressable persistent memory (PM), a cache line, instead of a page, is expected to be the unit of data transfer between volatile and non-volatile devices, but the failure-atomicity of write operations is guaranteed in the granularity of 8 bytes rather than cache lines. This granularity mismatch problem has generated interest in redesigning blockbased data structures such as B+-trees. However, various methods of modifying B+-trees for PM degrade the efficiency of B+-trees, and attempts have been made to use in-memory data structures for PM. In this study, we develop Failure-Atomic ShifT (FAST) and Failure-Atomic In-place Rebalance (FAIR) algorithms to resolve the granularity mismatch problem. Every 8-byte store instruction used in the FAST and FAIR algorithms transforms a B+-tree into another consistent state or a transient inconsistent state that read operations can tolerate. By making read operations tolerate transient inconsistency, we can avoid expensive copy-on-write, logging, and even the necessity of read latches so that read transactions can be non-blocking. Our experimental results show that legacy B+-trees with FAST and FAIR schemes outperform the state-of-the-art persistent indexing structures by a large margin.

[1]  Dan Feng,et al.  A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory , 2018, ICPP.

[2]  Jie Wu,et al.  Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory , 2018, OSDI.

[3]  Youjip Won,et al.  WALDIO: Eliminating the Filesystem Journaling in Resolving the Journaling of Journal Anomaly , 2015, USENIX Annual Technical Conference.

[4]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[5]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[6]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[7]  Paul E. McKenney,et al.  Memory Ordering in Modern Microprocessors , 2007 .

[8]  Trevor N. Mudge,et al.  A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[9]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[10]  Dongil Park,et al.  Resolving journaling of journal anomaly in android I/O: multi-version B-tree with lazy split , 2014, FAST.

[11]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[12]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Sam H. Noh,et al.  Write-Optimized Dynamic Hashing for Persistent Memory , 2019, FAST.

[14]  Eric Ruppert,et al.  Lock-free linked lists and skip lists , 2004, PODC '04.

[15]  Keshav Pingali,et al.  Single machine graph analytics on massive datasets using Intel optane DC persistent memory , 2019, Proc. VLDB Endow..

[16]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[17]  Faith Ellen,et al.  Non-blocking binary search trees , 2010, PODC.

[18]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[19]  Gregory R. Ganger,et al.  Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem , 1999, USENIX Annual Technical Conference, FREENIX Track.

[20]  Haibo Chen,et al.  Soft Updates Made Simple and Fast on Non-volatile Memory , 2017, USENIX Annual Technical Conference.

[21]  Xiao Liu,et al.  Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[22]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[23]  Samin Ishtiaq,et al.  Reasoning about the ARM weakly consistent memory model , 2008, MSPC '08.

[24]  Jun Li,et al.  Quartz: A Lightweight Performance Emulator for Persistent Memory Software , 2015, Middleware.

[25]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[26]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[27]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[28]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[29]  Sam H. Noh,et al.  Failure-Atomic Slotted Paging for Persistent Memory , 2017, ASPLOS.

[30]  Andrea C. Arpaci-Dusseau,et al.  Consistency without ordering , 2012, FAST.

[31]  Youyou Lu,et al.  Blurred persistence in transactional persistent memory , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[32]  Erez Petrank,et al.  A lock-free B+tree , 2012, SPAA '12.

[33]  Yiming Huai,et al.  Spin-Transfer Torque MRAM (STT-MRAM): Challenges and Prospects , 2008 .

[34]  Yuan Xie,et al.  Making B+-tree efficient in PCM-based main memory , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[35]  Jiwu Shu,et al.  Log-Structured Non-Volatile Main Memory , 2017, USENIX Annual Technical Conference.

[36]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[37]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[38]  Mark Horowitz,et al.  CPU DB: Recording Microprocessor History , 2012, ACM Queue.

[39]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[40]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[41]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[42]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[43]  Sam H. Noh,et al.  WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems , 2017, FAST.

[44]  Subramanya Dulloor,et al.  Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems , 2015, SIGMOD Conference.

[45]  M. Breitwisch Phase Change Memory , 2008, 2008 International Interconnect Technology Conference.

[46]  Haibo Chen,et al.  Using restricted transactional memory to build a scalable in-memory database , 2014, EuroSys '14.

[47]  Youjip Won,et al.  NVWAL: Exploiting NVRAM in Write-Ahead Logging , 2016, ASPLOS.