LogTM: log-based transactional memory

Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values "in place" (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits. In this paper, we present a new implementation of transactional memory, log-based transactional memory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32-way multiprocessor support our decision to optimize for commit by showing that only 1-2% of transactions abort.

[1]  Xavier Martorell,et al.  Implementing PARMACS Macros for Shared Memory Multiprocessor Environments , 1997 .

[2]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[3]  Robert Sims,et al.  Alpha architecture reference manual , 1992 .

[4]  Leslie Lamport,et al.  The mutual exclusion problem: part I—a theory of interprocess communication , 1986, JACM.

[5]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[6]  Hector Garcia-Molina,et al.  Modeling long-running activities as nested sagas , 1991 .

[7]  Antony L. Hosking,et al.  Nested Transactional Memory: Model and Preliminary Architecture Sketches , 2005 .

[8]  Gary Lauterbach,et al.  UltraSPARC-III: designing third-generation 64-bit performance , 1999, IEEE Micro.

[9]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[10]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[11]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[12]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[13]  Alaa R. Alameldeen,et al.  Addressing Workload Variability in Architectural Simulations , 2003, IEEE Micro.

[14]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[15]  Mahadev Satyanarayanan,et al.  Design Trade-Offs in VAX-11 Translation Buffer Organization , 1981, Computer.

[16]  F. Lemmermeyer Error-correcting Codes , 2005 .

[17]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[18]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[19]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[20]  Alan Jay Smith,et al.  A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.

[21]  Sean Lie Hardware Support for Unbounded Transactional Memory , 2004 .

[22]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[23]  Josep Torrellas,et al.  Using software logging to support multiversion buffering in thread-level speculation , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[24]  Albert Chang,et al.  801 storage: architecture and programming , 1988, TOCS.

[25]  Brian T. Lewis,et al.  Compiler and runtime support for efficient software transactional memory , 2006, PLDI '06.

[26]  Dan Grossman,et al.  Enforcing isolation and ordering in STM , 2007, PLDI '07.

[27]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[28]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[29]  Ravi Rajwar,et al.  FREE EXECUTION OF LOCK-BASED PROGRAMS , 2002 .

[30]  Josep Torrellas,et al.  Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.

[31]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[32]  Josep Torrellas,et al.  Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[33]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[34]  Josep Torrellas,et al.  Removing architectural bottlenecks to the scalability of speculative parallelization , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[35]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[36]  Josep Torrellas,et al.  Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[37]  David Flint,et al.  Challenges to Providing Performance Isolation in Transactional Memories , 2005 .

[38]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[39]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[40]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[41]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[42]  Eliot B. Moss,et al.  Nesting Transactions: Why and What Do We Need? , 2006 .

[43]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: An Efficient, Scalable Alternative to Reorder Buffers , 2003, IEEE Micro.

[44]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[45]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[46]  Gurindar S. Sohi,et al.  Speculative Versioning Cache , 2001, IEEE Trans. Parallel Distributed Syst..

[47]  Pat Conway,et al.  The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.

[48]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[49]  Keir Fraser,et al.  Language support for lightweight transactions , 2003, SIGP.

[50]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[51]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[52]  Kunle Olukotun,et al.  The common case transactional behavior of multithreaded programs , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[53]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[54]  Satish Narayanasamy,et al.  Unbounded page-based transactional memory , 2006, ASPLOS XII.

[55]  David A. Wood,et al.  Supporting nested transactional memory in logTM , 2006, ASPLOS XII.

[56]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[57]  Milo M. K. Martin,et al.  Deconstructing Transactional Semantics: The Subtleties of Atomicity , 2005 .

[58]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[59]  Hans-Jörg Schek,et al.  Concepts and Applications of Multilevel Transactions and Open Nested Transactions , 1992, Database Transaction Models for Advanced Applications.

[60]  Kunle Olukotun,et al.  Architectural Semantics for Practical Transactional Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[61]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[62]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[63]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[64]  Irving L. Traiger,et al.  Granularity of Locks and Degrees of Consistency in a Shared Data Base , 1998, IFIP Working Conference on Modelling in Data Base Management Systems.

[65]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[66]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[67]  Virendra J. Marathe,et al.  Adaptive Software Transactional Memory , 2005, DISC.

[68]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[69]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[70]  J. Eliot B. Moss Open Nested Transactions: Semantics and Support , 2006 .

[71]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[72]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[73]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[74]  William N. Scherer,et al.  Advanced contention management for dynamic software transactional memory , 2005, PODC '05.

[75]  Donald E. Porter,et al.  MetaTM/TxLinux: Transactional Memory for an Operating System , 2007, IEEE Micro.

[76]  Josep Torrellas,et al.  Tradeoffs in buffering memory state for thread-level speculation in multiprocessors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[77]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[78]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[79]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[80]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[81]  Philip Heidelberger,et al.  Multiple reservations and the Oklahoma update , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[82]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[83]  Bradley C. Kuszmaul,et al.  Unbounded Transactional Memory , 2005, HPCA.

[84]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.