Evaluation of Blue Gene/Q hardware support for transactional memories

This paper describes an end-to-end system implementation of the transactional memory (TM) programming model on top of the hardware transactional memory (HTM) of the Blue Gene/Q (BG/Q) machine. The TM programming model supports most C/C++ programming constructs on top of a best-effort HTM with the help of a complete software stack including the compiler, the kernel, and the TM runtime. An extensive evaluation of the STAMP benchmarks on BG/Q is the first of its kind in understanding characteristics of running coarse-grained TM workloads on HTMs. The study reveals several interesting insights on the overhead and the scalability of BG/Q HTM with respect to sequential execution, coarse-grain locking, and software TM.

[1]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[2]  Philip Heidelberger,et al.  Multiple reservations and the Oklahoma update , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[3]  Bradley C. Kuszmaul,et al.  Unbounded transactional memory , 2005, 11th International Symposium on High-Performance Computer Architecture.

[4]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[5]  Bratin Saha,et al.  Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[6]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[7]  Maged M. Michael,et al.  Software transactional memory , 2008, Commun. ACM.

[8]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[9]  Maged M. Michael,et al.  Software Transactional Memory: Why Is It Only a Research Toy? , 2008, ACM Queue.

[10]  Maged M. Michael,et al.  RingSTM: scalable transactions with a single atomic instruction , 2008, SPAA '08.

[11]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[12]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[13]  Marc Tremblay,et al.  Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor , 2009, ISCA '09.

[14]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[15]  Torvald Riegel,et al.  Time-Based Software Transactional Memory , 2010, IEEE Transactions on Parallel and Distributed Systems.

[16]  Dan Grossman,et al.  ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[17]  Michael F. Spear,et al.  NOrec: streamlining STM by abolishing ownership records , 2010, PPoPP '10.

[18]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.