Hardware acceleration of transactional memory on commodity systems

The adoption of transactional memory is hindered by the high overhead of software transactional memory and the intrusive design changes required by previously proposed TM hardware. We propose that hardware to accelerate software transactional memory (STM) can reside outside an unmodified commodity processor core, thereby substantially reducing implementation costs. This paper introduces Transactional Memory Acceleration using Commodity Cores (TMACC), a hardware-accelerated TM system that does not modify the processor, caches, or coherence protocol. We present a complete hardware implementation of TMACC using a rapid prototyping platform. Using this hardware, we implement two unique conflict detection schemes which are accelerated using Bloom filters on an FPGA. These schemes employ novel techniques for tolerating the latency of fine-grained asynchronous communication with an out-of-core accelerator. We then conduct experiments to explore the feasibility of accelerating TM without modifying existing system hardware. We show that, for all but short transactions, it is not necessary to modify the processor to obtain substantial improvement in TM performance. In these cases, TMACC outperforms an STM by an average of 69% in applications using moderate-length transactions, showing maximum speedup within 8% of an upper bound on TM acceleration. Overall, we demonstrate that hardware can substantially accelerate the performance of an STM on unmodified commodity processors.

[1]  Marc Lupon,et al.  FASTM: A Log-based Hardware Transactional Memory with Fast Abort Recovery , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[2]  Kunle Olukotun,et al.  Eigenbench: A simple exploration tool for orthogonal TM characteristics , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[3]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[4]  Marek Olszewski,et al.  JudoSTM: A Dynamic Binary-Rewriting Approach to Software Transactional Memory , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[5]  Rachid Guerraoui,et al.  Stretching transactional memory , 2009, PLDI '09.

[6]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Marc Tremblay,et al.  Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor , 2009, ISCA '09.

[8]  Kunle Olukotun,et al.  The OpenTM Transactional Application Programming Interface , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[9]  David A. Wood,et al.  TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[10]  Virendra J. Marathe,et al.  Adaptive Software Transactional Memory , 2005, DISC.

[11]  Donald E. Porter,et al.  MetaTM/TxLinux: Transactional Memory for an Operating System , 2007, IEEE Micro.

[12]  Michael F. Spear,et al.  An integrated hardware-software approach to flexible transactional memory , 2007, ISCA '07.

[13]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[14]  Cong Wang,et al.  NZTM: nonblocking zero-indirection transactional memory , 2009, SPAA '09.

[15]  Emmett Witchel,et al.  Maximum benefit from a minimal HTM , 2009, ASPLOS.

[16]  Stark C. Draper,et al.  Notary: Hardware techniques to enhance signatures , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[17]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[18]  Kunle Olukotun,et al.  FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[19]  Brian T. Lewis,et al.  Compiler and runtime support for efficient software transactional memory , 2006, PLDI '06.

[20]  Bratin Saha,et al.  Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[21]  Dan Grossman,et al.  Enforcing isolation and ordering in STM , 2007, PLDI '07.

[22]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[23]  Michael F. Spear,et al.  NOrec: streamlining STM by abolishing ownership records , 2010, PPoPP '10.

[24]  Quinn Jacobson,et al.  Architectural Support for Software Transactional Memory , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[25]  Keir Fraser,et al.  Language support for lightweight transactions , 2003, SIGP.

[26]  Maged M. Michael,et al.  Software Transactional Memory: Why Is It Only a Research Toy? , 2008, ACM Queue.

[27]  Maged M. Michael,et al.  RingSTM: scalable transactions with a single atomic instruction , 2008, SPAA '08.

[28]  Milo M. K. Martin,et al.  Making the fast case common and the uncommon case simple in unbounded transactional memory , 2007, ISCA '07.

[29]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[30]  Kunle Olukotun,et al.  A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[31]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[32]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[33]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[34]  Shubhendu S. Mukherjee,et al.  Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[35]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[36]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[37]  Håkan Grahn,et al.  Transactional memory , 2010, J. Parallel Distributed Comput..

[38]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[39]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[40]  Michael F. Spear Lightweight, robust adaptivity for software transactional memory , 2010, SPAA '10.