Conflict Reduction in Hardware Transactions Using Advisory Locks

Preliminary experience with hardware transactional memory suggests that aborts due to data conflicts are one of the principal obstacles to scale-up. To reduce the incidence of conflict, we propose an automatic, high-level mechanism that uses advisory locks to serialize (just) the portions of the transactions in which conflicting accesses occur. We demonstrate the feasibility of this mechanism, which we refer to as staggered transactions, with fully developed compiler and runtime support,running on simulated hardware. Our compiler identifies and instruments a small subset of the accesses in each transaction, which it determines, statically, are likely to constitute initial accesses to shared locations. At run time, the instrumentation acquires an advisory lock on the accessed datum, if (and only if) prior execution history suggests that the datum---or locations``downstream'' of it---are indeed a likely source of conflict. Policy to drive the decision requires one hardware feature not generally found in current commercial offerings: nontransactional loads and stores within transactions. It can also benefit from a mechanism to record the program counter at which a cache line was first accessed in a transaction. Simulation results show that staggered transactions can significantly reduce the frequency of conflict aborts and increase program performance.

[1]  Josep Torrellas,et al.  OmniOrder: Directory-based conflict serialization of transactions , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[3]  Hans-Juergen Boehm,et al.  The runtime abort graph and its application to software transactional memory optimization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[4]  Dan Grossman,et al.  ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[5]  Sean White,et al.  Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory , 2011, ASPLOS XVI.

[6]  Mateo Valero,et al.  EazyHTM: EAger-LaZY hardware Transactional Memory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Torvald Riegel,et al.  Automatic data partitioning in software transactional memories , 2008, SPAA '08.

[8]  Maged M. Michael,et al.  Robust architectural support for transactional memory in the power architecture , 2013, ISCA.

[9]  Emmett Witchel,et al.  Dependence-aware transactional memory for increased concurrency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[10]  Vikram S. Adve,et al.  Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.

[11]  T. N. Vijaykumar,et al.  Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies , 2013, ASPLOS '13.

[12]  Michael L. Scott,et al.  Software partitioning of hardware transactions , 2015, PPoPP.

[13]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[14]  Milo M. K. Martin,et al.  RETCON: transactional repair without replay , 2010, ISCA '10.

[15]  Torvald Riegel,et al.  A Lazy Snapshot Algorithm with Eager Validation , 2006, DISC.

[16]  Michael F. Spear,et al.  A comprehensive strategy for contention management in software transactional memory , 2009, PPoPP '09.

[17]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[18]  Ronald G. Dreslinski,et al.  Proactive transaction scheduling for contention management , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Torvald Riegel,et al.  Optimizing hybrid transactional memory: the importance of nonspeculative operations , 2011, SPAA '11.

[20]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[21]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Sandhya Dwarkadas,et al.  Refereeing conflicts in hardware transactional memory , 2009, ICS.

[24]  Vikram S. Adve,et al.  Macroscopic Data Structure Analysis and Optimization , 2005 .

[25]  Torvald Riegel,et al.  Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack , 2010, EuroSys '10.

[26]  Eduard Ayguadé,et al.  QuakeTM: parallelizing a complex sequential application using transactional memory , 2009, ICS.

[27]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  William N. Scherer,et al.  Advanced contention management for dynamic software transactional memory , 2005, PODC '05.