Seer: Probabilistic Scheduling for Hardware Transactional Memory

The ubiquity of multicore processors has led programmers to write parallel and concurrent applications to take advantage of the underlying hardware and speed up their executions. In this context, Transactional Memory (TM) has emerged as a simple and effective synchronization paradigm, via the familiar abstraction of atomic transactions. After many years of intense research, major processor manufacturers (including Intel) have recently released mainstream processors with hardware support for TM (HTM). In this work, we study a relevant issue with great impact on the performance of HTM. Due to the optimistic and inherently limited nature of HTM, transactions may have to be aborted and restarted numerous times, without any progress guarantee. As a result, it is up to the software library that regulates the HTM usage to ensure progress and optimize performance. Transaction scheduling is probably one of the most well-studied and effective techniques to achieve these goals. However, these recent mainstream HTMs have some technical limitations that prevent the adoption of known scheduling techniques: unlike software implementations of TM used in the past, existing HTMs provide limited or no information on which memory regions or contending transactions caused the abort. To address this crucial issue for HTMs, we propose S eer , a software scheduler that addresses precisely this restriction of HTM by leveraging on an online probabilistic inference technique that identifies the most likely conflict relations and establishes a dynamic locking scheme to serialize transactions in a fine-grained manner. The key idea of our solution is to constrain the portions of parallelism that are affecting negatively the whole system. As a result, this not only prevents performance reduction but also in fact unveils further scalability and performance for HTM. Via an extensive evaluation study, we show that S eer improves the performance of the Intel’s HTM by up to 3.6×, and by 65% on average across all concurrency degrees and benchmarks on a large processor with 28 cores.

[1]  Andi Kleen Scaling Existing Lock-based Applications with Lock Elision , 2014, ACM Queue.

[2]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[3]  Rachid Guerraoui,et al.  Democratizing transactional programming , 2014, CACM.

[4]  Sun Fire V20z Sun Microsystems , 1996 .

[5]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2008, Commun. ACM.

[6]  Yehuda Afek,et al.  Software-improved hardware lock elision , 2014, PODC '14.

[7]  Mihai Burcea,et al.  Transactional memory support for scalable and transparent parallelization of multiplayer games , 2010, EuroSys '10.

[8]  Keir Fraser,et al.  Concurrent programming without locks , 2007, TOCS.

[9]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Jan Vitek,et al.  STMBench7: a benchmark for software transactional memory , 2007, EuroSys '07.

[11]  Rachid Guerraoui,et al.  Preventing versus curing: avoiding conflicts in transactional memories , 2009, PODC '09.

[12]  Hsien-Hsin S. Lee,et al.  Adaptive transaction scheduling for transactional memory systems , 2008, SPAA '08.

[13]  Mark Moir,et al.  Adaptive integration of hardware and software lock elision techniques , 2014, SPAA.

[14]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[15]  Torvald Riegel,et al.  Automatic data partitioning in software transactional memories , 2008, SPAA '08.

[16]  Donald E. Porter,et al.  TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.

[17]  Maged M. Michael,et al.  Robust architectural support for transactional memory in the power architecture , 2013, ISCA.

[18]  Mark Moir,et al.  Scalable statistics counters , 2013, PPoPP '13.

[19]  Emmett Witchel,et al.  Is transactional programming actually easier? , 2010, PPoPP '10.

[20]  Elena Tsanko,et al.  Verification of transactional memory in POWER8 , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Paolo Romano,et al.  SCORe: A Scalable One-Copy Serializable Partial Replication Protocol , 2012, Middleware.

[22]  Nuno Diegues,et al.  Seer: Probabilistic Scheduling for Hardware Transactional Memory , 2015, SPAA.

[23]  Nuno Diegues,et al.  Self-Tuning Intel Transactional Synchronization Extensions , 2014, ICAC.

[24]  Maged M. Michael,et al.  Quantitative comparison of Hardware Transactional Memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8 , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[25]  Rachid Guerraoui,et al.  Stretching transactional memory , 2009, PLDI '09.

[26]  Nuno Diegues,et al.  Time-warp: lightweight abort minimization in transactional memory , 2014, PPoPP '14.

[27]  Adam Welc,et al.  Design and implementation of transactional constructs for C/C++ , 2008, OOPSLA '08.

[28]  Luís E. T. Rodrigues,et al.  Virtues and limitations of commodity hardware transactional memory , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[29]  João P. Cachopo,et al.  Lock-free and scalable multi-version software transactional memory , 2011, PPoPP '11.

[30]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[31]  Torvald Riegel,et al.  Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack , 2010, EuroSys '10.

[32]  Mark Moir,et al.  PhTM: Phased Transactional Memory , 2007 .

[33]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[34]  Nuno Diegues,et al.  Self-tuning Intel Restricted Transactional Memory , 2015, Parallel Comput..

[35]  Mikel Luján,et al.  Improving Performance by Reducing Aborts in Hardware Transactional Memory , 2010, HiPEAC.

[36]  Mikel Luján,et al.  Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering , 2008, HiPEAC.

[37]  Michael L. Scott,et al.  Conflict Reduction in Hardware Transactions Using Advisory Locks , 2015, SPAA.

[38]  Roberto Palmieri,et al.  HiperTM: High performance, fault-tolerant transactional memory , 2014, Theor. Comput. Sci..

[39]  Danny Hendler,et al.  CAR-STM: scheduling-based collision avoidance and resolution for software transactional memory , 2008, PODC '08.

[40]  Nir Shavit,et al.  Reduced hardware transactions: a new approach to hybrid transactional memory , 2013, SPAA.

[41]  Anne-Marie Kermarrec,et al.  ProteusTM: Abstraction Meets Performance in Transactional Memory , 2016, ASPLOS.

[42]  Victor Pankratius,et al.  Software Engineering with Transactional Memory Versus Locks in Practice , 2013, Theory of Computing Systems.

[43]  João P. Cachopo,et al.  Practical Parallel Nesting for Software Transactional Memory , 2013, DISC.

[44]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[45]  Roberto Palmieri,et al.  On Scheduling Best-Effort HTM Transactions , 2015, SPAA.

[46]  Maged M. Michael The balancing act of choosing nonblocking features , 2013, CACM.

[47]  Hugo Rito,et al.  ProPS: A Progressively Pessimistic Scheduler for Software Transactional Memory , 2014, Euro-Par.

[48]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[49]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[50]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[51]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[52]  Danny Hendler,et al.  Scheduling support for transactional memory contention management , 2010, PPoPP '10.

[53]  Yossi Lev Brown PhTM : Phased Transactional Memory ∗ , 2007 .

[54]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[55]  Hugo Rito,et al.  Adaptive transaction scheduling for mixed transactional workloads , 2015, Parallel Comput..

[56]  Danny Hendler,et al.  On the impact of serializing contention management on STM performance , 2009, J. Parallel Distributed Comput..