Hathi: durable transactions for memory using flash

Recent architectural trends---cheap, fast solid-state storage, inexpensive DRAM, and multi-core CPUs---provide an opportunity to rethink the interface between applications and persistent storage. To leverage these advances, we propose a new system architecture called Hathi that provides an in-memory transactional heap made persistent using high-speed flash drives. With Hathi, programmers can make consistent concurrent updates to in-memory data structures that survive system failures. Hathi focuses on three major design goals: ACID semantics, a simple programming interface, and fine-grained programmer control. Hathi relies on software transactional memory to provide a simple concurrent interface to in-memory data structures, and extends it with persistent logs and checkpoints to add durability. To reduce the cost of durability, Hathi uses two main techniques. First, it provides split-phase and partitioned commit interfaces, that allow programmers to overlap commit I/O with computation and to avoid unnecessary synchronization. Second, it uses partitioned logging, which reduces contention on in-memory log buffers and exploits internal SSD parallelism. We find that our implementation of Hathi can achieve 1.25 million txns/s with a single SSD.

[1]  Mahadev Satyanarayanan,et al.  Lightweight recoverable virtual memory , 1993, SOSP '93.

[2]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[3]  Eugene J. Shekita,et al.  Cricket: A Mapped, Persistent Object Store , 1990, POS.

[4]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[5]  Vivek Singhal,et al.  Texas: good, fast, cheap persistence for C++ , 1992, OOPSLA '92.

[6]  Johannes Gehrke,et al.  Fast checkpoint recovery algorithms for frequently consistent applications , 2011, SIGMOD '11.

[7]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[8]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[9]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[10]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[11]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[12]  Håkan Grahn,et al.  Transactional memory , 2010, J. Parallel Distributed Comput..

[13]  Johannes Gehrke,et al.  An Evaluation of Checkpoint Recovery for Massively Multiplayer Online Games , 2009, Proc. VLDB Endow..

[14]  Michael M. Swift,et al.  FlashVM: Virtual Memory Management on Flash , 2010, USENIX Annual Technical Conference.

[15]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[16]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[17]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[18]  Vivek S. Pai,et al.  SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy , 2011, NSDI.

[19]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[20]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[21]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[24]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[25]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[26]  Andreas Reuter,et al.  Group Commit Timers and High Volume Transaction Systems , 1987, HPTS.

[27]  John Rosenberg,et al.  Grasshopper: An Orthogonally Persistent Operating System , 1994, Comput. Syst..

[28]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[29]  David J. DeWitt,et al.  QuickStore: A high performance mapped object store , 1994, SIGMOD '94.