Unbounded Transactional Memory

Hardware transactional memory should support unbounded transactions: transactions of arbitrary size and duration. We describe a hardware implementation of unbounded transactional memory, called UTM, which exploits the common case for performance without sacrificing correctness on transactions whose footprint can be nearly as large as virtual memory. We performed a cycle-accurate simulation of a simplified architecture, called LTM. LTM is based on UTM but is easier to implement, because it does not change the memory subsystem outside of the processor. LTM allows nearly unbounded transactions, whose footprint is limited only by physical memory size and whose duration by the length of a timeslice. We assess UTM and LTM through microbenchmarking and by automatically converting the SPECjvm98 Java benchmarks and the Linux 2.4.19 kernel to use transactions instead of locks. We use both cycle-accurate simulation and instrumentation to understand benchmark behavior. Our studies show that the common case is small transactions that commit, even when contention is high, but that some applications contain very large transactions. For example, although 99.9% of transactions in the Linux study touch 54 cache lines or fewer, some transactions touch over 8000 cache lines. Our studies also indicate that hardware support is required, because some applications spend over half their time in critical regions. Finally, they suggest that hardware support for transactions can make Java programs run faster than when run using locks and can increase the concurrency of the Linux kernel by as much as a factor of 4 with no additional programming work.

[1]  Cormac Flanagan,et al.  A type and effect system for atomicity , 2003, PLDI.

[2]  Robert C. Daley,et al.  The Multics virtual memory , 1972, Commun. ACM.

[3]  Robert C. Miller,et al.  A type-checking preprocessor for Cilk 2, a multithreaded C language , 1995 .

[4]  Sarita V. Adve,et al.  RSIM Reference Manual: Version 1.0 , 1997 .

[5]  Josep Torrellas,et al.  Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.

[6]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[7]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[8]  Sean Lie Hardware Support for Unbounded Transactional Memory , 2004 .

[9]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[10]  Jeff Dike,et al.  A user-mode port of the Linux kernel , 2000, Annual Linux Showcase & Conference.

[11]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[12]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[13]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[14]  Leslie Lamport,et al.  Concurrent reading and writing , 1977, Commun. ACM.

[15]  Philip Heidelberger,et al.  Multiple reservations and the Oklahoma update , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[16]  Calton Pu,et al.  A Lock-Free Multiprocessor OS Kernel , 1992, OPSR.

[17]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[18]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[19]  Ravi Rajwar,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[20]  David R. Cheriton,et al.  The synergy between non-blocking synchronization and operating system structure , 1996, OSDI '96.

[21]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[22]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[23]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[24]  Robert Metcalfe,et al.  Ethernet: distributed packet switching for local computer networks , 1976, CACM.

[25]  Allen Newell,et al.  Computer Structures: Principles and Examples , 1983 .

[26]  Ruby B. Lee Precision architecture , 1989, Computer.

[27]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[28]  Maurice Herlihy,et al.  Obstruction-free synchronization: double-ended queues as an example , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[29]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[30]  Maurice Herlihy,et al.  Impossibility and universality results for wait-free synchronization , 1988, PODC '88.