Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume

Transactional memory (TM) aims at simplifying concurrent programming via the familiar abstraction of atomic transactions. Recently, Intel and IBM have integrated hardware based TM (HTM) implementations in commodity processors, paving the way for the mainstream adoption of the TM paradigm. Yet, existing HTM implementations suffer from a crucial limitation, which hampers the adoption of HTM as a general technique for regulating concurrent access to shared memory: the inability to execute transactions whose working sets exceed the capacity of CPU caches. In this paper we propose P8TM, a novel approach that mitigates this limitation on IBM’s POWER8 architecture by leveraging a key combination of techniques: uninstrumented read-only transactions, Rollback Only Transaction-based update transactions, HTM-friendly (software-based) read-set tracking, and self-tuning. P8TM can dynamically switch between different execution modes to best adapt to the nature of the transactions and the experienced abort patterns. Indepth evaluation with several benchmarks indicates that P8TM can achieve striking performance gains in workloads that stress the capacity limitations of HTM, while achieving performance on par with HTM even in unfavourable workloads. 1998 ACM Subject Classification D.1.3 Concurrent Programming

[1]  Nir Shavit,et al.  Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory , 2015, ASPLOS.

[2]  Pascal Felber,et al.  Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume , 2019, Distributed Computing.

[3]  Sameer Kulkarni,et al.  A transactional memory with automatic performance tuning , 2012, TACO.

[4]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[5]  Luís E. T. Rodrigues,et al.  Virtues and limitations of commodity hardware transactional memory , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[6]  Nuno Diegues,et al.  Self-Tuning Intel Transactional Synchronization Extensions , 2014, ICAC.

[7]  Justin Emile Gottschlich,et al.  Transactional Language Constructs for C + + , 2012 .

[8]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[9]  Yehuda Afek,et al.  Programming with hardware lock elision , 2013, PPoPP '13.

[10]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[11]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  Nir Shavit,et al.  Understanding Tradeoffs in Software Transactional Memory , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[13]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Sally A. McKee,et al.  Performance and Energy Analysis of the Restricted Transactional Memory Implementation on Haswell , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[15]  Maged M. Michael,et al.  Transactional memory support in the IBM POWER8 processor , 2015, IBM J. Res. Dev..

[16]  DiceDave,et al.  Early experience with a commercial hardware transactional memory implementation , 2009 .

[17]  Maged M. Michael,et al.  Robust architectural support for transactional memory in the power architecture , 2013, ISCA.

[18]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[19]  Mark Moir,et al.  Adaptive integration of hardware and software lock elision techniques , 2014, SPAA.

[20]  Torvald Riegel,et al.  Optimizing hybrid transactional memory: the importance of nonspeculative operations , 2011, SPAA '11.

[21]  Mark Moir,et al.  Hardware extensions to make lazy subscription safe , 2014, ArXiv.

[22]  Pascal Felber,et al.  Hardware read-write lock elision , 2016, EuroSys.

[23]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[24]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[25]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[26]  Nuno Diegues,et al.  Seer: Probabilistic Scheduling for Hardware Transactional Memory , 2015, SPAA.

[27]  Sean White,et al.  Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory , 2011, ASPLOS XVI.

[28]  Irina Calciu,et al.  Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory , 2014 .

[29]  Maged M. Michael,et al.  Quantitative comparison of Hardware Transactional Memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8 , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).