Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory

We demonstrate how fine-grained memory protection can be used in support of transactional memory systems: first showing how a software transactional memory system (STM) can be made strongly atomic by using memory protection on transactionally-held state, then showing how such a strongly-atomic STM can be used with a bounded hardware TM system to build a hybrid TM system in which zero-overhead hardware transactions may safely run concurrently with potentially-conflicting software transactions. We experimentally demonstrate how this hybrid TM organization avoids the common-case overheads associated with previous hybrid TM proposals, achieving performance rivaling an unbounded HTM system without the hardware complexity of ensuring completion of arbitrary transactions in hardware. As part of our findings, we identify key policies regarding contention management within and across the hardware and software TM components that are key to achieving robust performance with a hybrid TM.

[1]  William A. Wulf Compilers and Computer Architecture , 1981, Computer.

[2]  Vivek Sarkar Synchronization using counting semaphores , 1988, ICS '88.

[3]  Andrew W. Appel,et al.  RETROSPECTIVE : Real-time Concurrent Collection on Stock Multiprocessors , 2004 .

[4]  Rajiv Gupta The fuzzy barrier: a mechanism for high speed synchronization of processors , 1989, ASPLOS 1989.

[5]  Gurindar S. Sohi,et al.  The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.

[6]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[7]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[8]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[9]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[10]  Sarita V. Adve,et al.  Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models , 1997, SPAA '97.

[11]  S. Harris Computer and Internet Standards , 1998 .

[12]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[13]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[14]  Jonathan M. Bull,et al.  A Multithreaded Java Grande Benchmark Suite , 2001 .

[15]  Katherine Yelick,et al.  Titanium Language Reference Manual , 2001 .

[16]  Krste Asanovic,et al.  Mondrian memory protection , 2002, ASPLOS X.

[17]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[18]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[19]  A. Appel Real-time concurrent collection on stock multiprocessors , 1988, SIGP.

[20]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[21]  Wei Liu,et al.  iWatcher: efficient architectural support for software debugging , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[23]  Bradley C. Kuszmaul,et al.  Unbounded transactional memory , 2005, 11th International Symposium on High-Performance Computer Architecture.

[24]  Rastislav Bodík,et al.  Runtime specialization with optimistic heap analysis , 2005, OOPSLA '05.

[25]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[26]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[27]  William N. Scherer,et al.  Advanced contention management for dynamic software transactional memory , 2005, PODC '05.

[28]  David A. Wood,et al.  Supporting nested transactional memory in logTM , 2006, ASPLOS XII.

[29]  Milo M. K. Martin,et al.  Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.

[30]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[31]  David Holmes,et al.  Java Concurrency in Practice , 2006 .

[32]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[33]  Kunle Olukotun,et al.  Architectural Semantics for Practical Transactional Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[34]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[35]  Satish Narayanasamy,et al.  Unbounded page-based transactional memory , 2006, ASPLOS XII.

[36]  Michael F. Spear,et al.  Transactions and privatization in Delaunay triangulation , 2007, PODC '07.

[37]  Dan Grossman,et al.  Enforcing isolation and ordering in STM , 2007, PLDI '07.

[38]  Milo M. K. Martin,et al.  Making the fast case common and the uncommon case simple in unbounded transactional memory , 2007, ISCA '07.

[39]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[40]  Vivek Sarkar,et al.  Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.

[41]  Vivek Sarkar,et al.  Language Extensions in Support of Compiler Parallelization , 2007, LCPC.

[42]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[43]  Mark Moir,et al.  PhTM: Phased Transactional Memory , 2007 .

[44]  Craig B. Zilles,et al.  Hardware atomicity for reliable software speculation , 2007, ISCA '07.

[45]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[46]  Donald E. Porter,et al.  TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.

[47]  Thomas F. Wenisch,et al.  Mechanisms for store-wait-free multiprocessors , 2007, ISCA '07.

[48]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[49]  Yossi Lev Brown PhTM : Phased Transactional Memory ∗ , 2007 .

[50]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[51]  Craig B. Zilles,et al.  Hardware Atomicity: An Effective Abstraction for Reliable Software Speculation , 2008, IEEE Micro.

[52]  Craig B. Zilles,et al.  An Analysis of I/O And Syscalls In Critical Sections And Their Implications For Transactional Memory , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[53]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.