Virtues and limitations of commodity hardware transactional memory

Over the last years Transactional Memory (TM) gained growing popularity as a simpler, attractive alternative to classic lock-based synchronization schemes. Recently, the TM landscape has been profoundly changed by the integration of Hardware TM (HTM) in Intel commodity processors, raising a number of questions on the future of TM. We seek answers to these questions by conducting the largest study on TM to date, comparing different locking techniques, hardware and software TMs, as well as different combinations of these mechanisms, from the dual perspective of performance and power consumption. Our study sheds a mix of light and shadows on currently available commodity HTM: on one hand, we identify workloads in which HTM clearly outperforms any alternative synchronization mechanism; on the other hand, we show that current HTM implementations suffer of restrictions that narrow the scope in which these can be more effective than state of the art software solutions. Thanks to the results of our study, we identify a number of compelling research problems in the areas of TM design, compilers and self-tuning.

[1]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[2]  Rachid Guerraoui,et al.  On the correctness of transactional memory , 2008, PPoPP.

[3]  Nir Shavit,et al.  Reduced hardware transactions: a new approach to hybrid transactional memory , 2013, SPAA.

[4]  Bruno Ciciani,et al.  Machine Learning-Based Self-Adjusting Concurrency in Software Transactional Memory Systems , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[5]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[6]  Sean White,et al.  Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory , 2011, ASPLOS XVI.

[7]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[8]  Nuno Diegues,et al.  Self-Tuning Intel Transactional Synchronization Extensions , 2014, ICAC.

[9]  Rodolfo Azevedo,et al.  Characterizing the Energy Consumption of Software Transactional Memory , 2009, IEEE Computer Architecture Letters.

[10]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[11]  Massimo Poncino,et al.  Energy-optimal synchronization primitives for single-chip multi-processors , 2009, GLSVLSI '09.

[12]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[13]  Rachid Guerraoui,et al.  Stretching transactional memory , 2009, PLDI '09.

[14]  Nuno Diegues,et al.  Time-warp: lightweight abort minimization in transactional memory , 2014, PPoPP '14.

[15]  Maurice Herlihy,et al.  Embedded-TM: Energy and complexity-effective hardware transactional memory for embedded multicore systems , 2010, J. Parallel Distributed Comput..

[16]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[17]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[18]  Manuel E. Acacio,et al.  On the design of energy‐efficient hardware transactional memory systems , 2013, Concurr. Comput. Pract. Exp..

[19]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[20]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Wolfgang Lehner,et al.  Improving in-memory database index performance with Intel® Transactional Synchronization Extensions , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[22]  Peter Kulchyski and , 2015 .

[23]  Michael F. Spear,et al.  NOrec: streamlining STM by abolishing ownership records , 2010, PPoPP '10.

[24]  Yehuda Afek,et al.  Programming with hardware lock elision , 2013, PPoPP '13.

[25]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[26]  João P. Cachopo,et al.  Practical Parallel Nesting for Software Transactional Memory , 2013, DISC.

[27]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[28]  Victor Pankratius,et al.  A study of transactional memory vs. locks in practice , 2011, SPAA '11.

[29]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[30]  Yujie Liu,et al.  Transactionalizing legacy code: an experience report using GCC and Memcached , 2014, ASPLOS.

[31]  Torvald Riegel,et al.  Optimizing hybrid transactional memory: the importance of nonspeculative operations , 2011, SPAA '11.

[32]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[33]  Shankar Balachandran,et al.  The Implications of Shared Data Synchronization Techniques on Multi-Core Energy Efficiency , 2012, HotPower.

[34]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[35]  Torvald Riegel,et al.  Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack , 2010, EuroSys '10.

[36]  Roberto Palmieri,et al.  On the analytical modeling of concurrency control algorithms for Software Transactional Memories: The case of Commit-Time-Locking , 2012, Perform. Evaluation.

[37]  Maged M. Michael,et al.  Robust architectural support for transactional memory in the power architecture , 2013, ISCA.

[38]  Wolfgang E. Nagel,et al.  Power measurement techniques on standard compute nodes: A quantitative comparison , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[39]  Sandya Mannarswamy,et al.  Compiler aided selective lock assignment for improving the performance of software transactional memory , 2010, PPoPP '10.

[40]  Armin Heindl,et al.  An analytic framework for performance modeling of software transactional memory , 2009, Comput. Networks.