Kicking the tires of software transactional memory: why the going gets tough

Transactional Memory (TM) promises to simplify concurrent programming, which has been notoriously difficult but crucial in realizing the performance benefit of multi-core processors. Software Transaction Memory (STM), in particular, represents a body of important TM technologies since it provides a mechanism to run transactional programs when hardware TM support is not available, or when hardware TM resources are exhausted. Nonetheless, most previous researches on STMs were constrained to executing trivial, small-scale workloads. The assumption was that the same techniques applied to small-scale workloads could readily be applied to real-life, large-scale workloads. However, by executing several nontrivial workloads such as particle dynamics simulation and game physics engine on a state of the art STM, we noticed that this assumption does not hold. Specifically, we identified four major performance bottlenecks that were unique to the case of executing large-scale workloads on an STM: false conflicts, over-instrumentation, privatization-safety cost, and poor amortization. We believe that these bottlenecks would be common for any STM targeting real-world applications. In this paper, we describe those identified bottlenecks in detail, and we propose novel solutions to alleviate the issues. We also thoroughly validate these approaches with experimental results on real machines.

[1]  Craig B. Zilles,et al.  Implications of False Conflict Rate Trends for Robust Software Transactional Memory , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[2]  Ali-Reza Adl-Tabatabai,et al.  McRT-Malloc: a scalable transactional memory allocator , 2006, ISMM '06.

[3]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4]  Bratin Saha,et al.  Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[5]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[6]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[7]  Kunle Olukotun,et al.  The OpenTM Transactional Application Programming Interface , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[8]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[9]  Michael F. Spear,et al.  Privatization techniques for software transactional memory , 2007, PODC '07.

[10]  Yen-Kuang Chen,et al.  The ALPBench benchmark suite for complex multimedia applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[11]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[12]  David Eisenstat,et al.  Lowering the Overhead of Nonblocking Software Transactional Memory , 2006 .

[13]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[14]  Dan Grossman,et al.  Enforcing isolation and ordering in STM , 2007, PLDI '07.

[15]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[16]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[17]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.