Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory

The predicted shift to non-volatile, byte-addressable memory (e.g., Phase Change Memory and Memristor), the growth of "big data", and the subsequent emergence of frameworks such as memcached and NoSQL systems require us to rethink the design of data stores. To derive the maximum performance from these new memory technologies, this paper proposes the use of single-level data stores. For these systems, where no distinction is made between a volatile and a persistent copy of data, we present Consistent and Durable Data Structures (CDDSs) that, on current hardware, allows programmers to safely exploit the low-latency and non-volatile aspects of new memory technologies. CDDSs use versioning to allow atomic updates without requiring logging. The same versioning scheme also enables rollback for failure recovery. When compared to a memory-backed Berkeley DB B-Tree, our prototype-based results show that a CDDS B-Tree can increase put and get throughput by 74% and 138%. When compared to Cassandra, a two-level data store, Tembo, a CDDS B-Tree enabled distributed Key-Value system, increases throughput by up to 250%-286%.

[1]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[2]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[3]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[4]  Mahadev Satyanarayanan,et al.  Lightweight recoverable virtual memory , 1993, SOSP '93.

[5]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[6]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[7]  R. C. Whaley,et al.  Towards interoperability: a wrapper model for integrating remote laboratories in a collaborative discovery learning environment , 2008 .

[8]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[9]  Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference, June 6-11, 1999, Monterey, California, USA , 1999, USENIX Annual Technical Conference, FREENIX Track.

[10]  T. Schloesser,et al.  Challenges for the DRAM cell scaling to 40nm , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[11]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.

[12]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[13]  R. Clint Whaley,et al.  Achieving accurate and context‐sensitive timing for code optimization , 2008, Softw. Pract. Exp..

[14]  Rakesh M. Verma,et al.  An Efficient Multiversion Access STructure , 1997, IEEE Trans. Knowl. Data Eng..

[15]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[16]  Robert H. Dennard,et al.  Challenges and future directions for the scaling of dynamic random-access memory (DRAM) , 2002, IBM J. Res. Dev..

[17]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[18]  WeiserMark,et al.  Garbage collection in an uncooperative environment , 1988 .

[19]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[20]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[21]  Hans-Juergen Boehm,et al.  Garbage collection in an uncooperative environment , 1988, Softw. Pract. Exp..

[22]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[23]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[24]  Peter M. Chen,et al.  The Rio file cache: surviving operating system crashes , 1996, ASPLOS VII.

[25]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[26]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[27]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[28]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[29]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[30]  Craig A. N. Soules,et al.  Metadata Efficiency in Versioning File Systems , 2003, FAST.

[31]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[32]  Bingsheng He,et al.  Tree Indexing on Flash Disks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  Chris Okasaki,et al.  Purely functional data structures , 1998 .

[34]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[35]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[36]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[37]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[38]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.

[39]  Winfried W. Wilcke,et al.  Storage-class memory: The next storage system technology , 2008, IBM J. Res. Dev..

[40]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[41]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.