DHTM: Durable Hardware Transactional Memory

The emergence of byte-addressable persistent (non-volatile) memory provides a low latency and high bandwidth path to durability. However, programmers need guarantees on what will remain in persistent memory in the event of a system crash. A widely accepted model for crash consistent programming is ACID transactions, in which updates within a transaction are made visible as well as durable in an atomic manner. However, existing software based proposals suffer from significant performance overheads. In this paper, we support both atomic visibility and durability in hardware. We propose DHTM (durable hardware transactional memory) that leverages a commercial HTM to provide atomic visibility and extends it with hardware support for redo logging to provide atomic durability. Furthermore, we leverage the same logging infrastructure to extend the supported transaction size (from being L1-limited to LLC-limited) with only minor changes to the coherence protocol. Our evaluation shows that DHTM outperforms the state-of-the-art by an average of 21% to 25% on TATP, TPC-C and a set of microbenchmarks. We believe DHTM is the first complete and practical hardware based solution for ACID transactions that has the potential to significantly ease the burden of crash consistent programming.

[1]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Margo I. Seltzer,et al.  A Scalable Distributed Graph Partitioner , 2015, Proc. VLDB Endow..

[5]  Stratis Viglas,et al.  Efficient persist barriers for multicores , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[7]  Youyou Lu,et al.  Loose-Ordering Consistency for persistent memory , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[8]  Ryan Johnson,et al.  Scalable Logging through Emerging Non-Volatile Memory , 2014, Proc. VLDB Endow..

[9]  Thomas F. Wenisch,et al.  Delegated persist ordering , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[11]  Yan Solihin,et al.  Proteus: A Flexible and Fast Software Supported Hardware Logging approach for NVM , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Padma Raghavan,et al.  NUMA-aware graph mining techniques for performance and energy efficiency , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[14]  Stratis Viglas,et al.  ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Hisashi Shima,et al.  Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.

[16]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[17]  Yuan Xie,et al.  Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Viktor Leis,et al.  Scaling HTM-Supported Database Transactions to Many Cores , 2016, IEEE Transactions on Knowledge and Data Engineering.

[19]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[21]  Bradley C. Kuszmaul,et al.  Unbounded transactional memory , 2005, 11th International Symposium on High-Performance Computer Architecture.

[22]  Keshav Pingali,et al.  An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.

[23]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[25]  Thomas F. Wenisch,et al.  High-Performance Transactions for Persistent Memories , 2016, ASPLOS.

[26]  Peter J. Varman,et al.  Continuous checkpointing of HTM transactions in NVM , 2017, ISMM.

[27]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[28]  Onur Mutlu,et al.  FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[30]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[31]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[32]  Weimin Zheng,et al.  DudeTM: Building Durable Transactions with Decoupling for Persistent Memory , 2017, ASPLOS.

[33]  Trevor Brown,et al.  PHyTM: Persistent Hybrid Transactional Memory , 2016, Proc. VLDB Endow..

[34]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[35]  Maged M. Michael,et al.  Transactional memory support in the IBM POWER8 processor , 2015, IBM J. Res. Dev..

[36]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[37]  Shoji Ikeda,et al.  2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[38]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[39]  Ymir Vigfusson,et al.  Affinity in Distributed Systems , 2010 .

[40]  Ellis Giles,et al.  Atomic persistence for SCM with a non-intrusive backend controller , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[41]  Karsten Schwan,et al.  NVRAM-aware Logging in Transaction Systems , 2014, Proc. VLDB Endow..

[42]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[43]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[44]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[45]  Wolfgang Lehner,et al.  Improving in-memory database index performance with Intel® Transactional Synchronization Extensions , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[46]  Peter J. Varman,et al.  Brief Announcement: Hardware Transactional Storage Class Memory , 2017, SPAA.

[47]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[48]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[49]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[50]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[51]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[52]  Stratis Viglas,et al.  REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures , 2015, Proc. VLDB Endow..

[53]  Charles E. Leiserson,et al.  Deterministic parallel random-number generation for dynamic-multithreading platforms , 2012, PPoPP '12.

[54]  Irina Calciu,et al.  Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory , 2014 .

[55]  Michael M. Swift,et al.  An Analysis of Persistent Memory Use with WHISPER , 2017, ASPLOS.

[56]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[57]  Dan Grossman,et al.  ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[58]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[59]  Margo I. Seltzer,et al.  Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory , 2017, HotStorage.

[60]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[61]  E. L. Miller,et al.  Relaxing Persistent Memory Constraints with Hardware-Driven Undo + Redo Logging , 2016 .

[62]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[63]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[64]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[65]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[66]  Haibo Chen,et al.  Persistent Transactional Memory , 2015, IEEE Computer Architecture Letters.