NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems

The non-volatile memory (NVM) has DRAM-like performance and disk-like persistency which make it possible to replace both disk and DRAM to build single level systems. To keep data consistency in such systems is non-trivial because memory writes may be reordered by CPU and memory controller. In this paper, we study the consistency cost for an important and common data structure, B+Tree. Although the memory fence and CPU cacheline flush instructions can order memory writes to achieve data consistency, they introduce a significant overhead (more than 10X slower in performance). Based on our quantitative analysis of consistency cost, we propose NV-Tree, a consistent and cache-optimized B+Tree variant with reduced CPU cacheline flush. We implement and evaluate NV-Tree and NV-Store, a key-value store based on NV-Tree, on an NVDIMM server. NV-Tree outperforms the state-of-art consistent tree structures by up to 12X under write-intensive workloads. NV-Store increases the throughput by up to 4.8X under YCSB workloads compared to Redis.

[1]  Andrea C. Arpaci-Dusseau,et al.  All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications , 2014, OSDI.

[2]  Takayuki Kawahara,et al.  Scalable Spin-Transfer Torque RAM Technology for Normally-Off Computing , 2011, IEEE Design & Test of Computers.

[3]  Bharath Ramsundar,et al.  NVMKV: A Scalable and Lightweight Flash Aware Key-Value Store , 2014, HotStorage.

[4]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[5]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[6]  Youjip Won,et al.  Selective Segment Initialization: Exploiting NVRAM to Reduce Device Startup Latency , 2014, IEEE Embedded Systems Letters.

[7]  Tajana Simunic,et al.  PDRAM: A hybrid PRAM and DRAM main memory system , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[8]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[9]  Chao Wang,et al.  NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[10]  Andrea C. Arpaci-Dusseau,et al.  Warped Mirrors for flash , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[11]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[12]  Steven Swanson,et al.  The bleak future of NAND flash memory , 2012, FAST.

[13]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[14]  Winfried W. Wilcke,et al.  Storage-class memory: The next storage system technology , 2008, IBM J. Res. Dev..

[15]  T. Schloesser,et al.  Challenges for the DRAM cell scaling to 40nm , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[16]  Andrea C. Arpaci-Dusseau,et al.  Snapshots in a flash with ioSnap , 2014, EuroSys '14.

[17]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[18]  Dongil Park,et al.  Resolving journaling of journal anomaly in android I/O: multi-version B-tree with lazy split , 2014, FAST.

[19]  Parthasarathy Ranganathan,et al.  Consistent, durable, and safe memory management for byte-addressable non volatile main memory , 2013, TRIOS@SOSP.

[20]  John Shalf,et al.  Exploring the future of out-of-core computing with compute-local non-volatile memory , 2014, Sci. Program..

[21]  Steven Swanson,et al.  DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.

[22]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[23]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[24]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[25]  Cheng Li,et al.  Nitro: A Capacity-Optimized SSD Cache for Primary Storage , 2014, USENIX Annual Technical Conference.

[26]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[28]  Chanik Park,et al.  Active disk meets flash: a case for intelligent SSDs , 2013, ICS '13.

[29]  Dutch T. Meyer,et al.  Strata: scalable high-performance storage on virtualized non-volatile memory , 2014, FAST.

[30]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[31]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[32]  Michael Stonebraker,et al.  A Prolegomenon on OLTP Database Systems for Non-Volatile Memory , 2014, ADMS@VLDB.

[33]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[34]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[35]  Robert H. Dennard,et al.  Challenges and future directions for the scaling of dynamic random-access memory (DRAM) , 2002, IBM J. Res. Dev..

[36]  Jian Xu,et al.  Bankshot: caching slow storage in fast non-volatile memory , 2013, INFLOW '13.

[37]  David B. Lomet,et al.  Access methods for multiversion data , 1989, SIGMOD '89.

[38]  Angela Demke Brown,et al.  Recon: Verifying file system consistency at runtime , 2012, TOS.

[39]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[40]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[41]  Eunji Lee,et al.  Unioning of the buffer cache and journaling layers with non-volatile memory , 2013, FAST.

[42]  Jianliang Xu,et al.  PCMLogging: reducing transaction logging overhead with PCM , 2011, CIKM '11.

[43]  Steven Swanson,et al.  QuickSAN: a storage area network for fast, distributed, solid state disks , 2013, ISCA.

[44]  Michael A. Bender,et al.  Don't Thrash: How to Cache Your Hash on Flash , 2011, Proc. VLDB Endow..

[45]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[46]  Bingsheng He,et al.  Tree indexing on solid state drives , 2010, Proc. VLDB Endow..

[47]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[48]  Bingsheng He,et al.  Operation-aware buffer management in flash-based systems , 2011, SIGMOD '11.

[49]  Ren-Shuo Liu,et al.  NVM duet: unified working memory and persistent store architecture , 2014, ASPLOS.

[50]  Andrea C. Arpaci-Dusseau,et al.  De-indirection for flash-based SSDs with nameless writes , 2012, FAST.

[51]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[52]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[53]  Dutch T. Meyer,et al.  Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory , 2014, FAST 2014.

[54]  John Shalf,et al.  Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems , 2014, ASPLOS.

[55]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[56]  Angela Demke Brown,et al.  Reliable Writeback for Client-side Flash Caches , 2014, USENIX Annual Technical Conference.

[57]  Youjip Won,et al.  Bootless Boot: Reducing Device Boot Latency with Byte Addressable NVRAM , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.