Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory

Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length in prior work within the framework of scalable chip-multiprocessing systems. Yet, the impact of simple structural optimizations like write-buffering has not been investigated and performance deviations due to the presence or absence of these optimizations remains unclear. This lack of insight into the effective use and impact of these interfacial structures between the processor core and the coherent memory hierarchy forms the crux of the problem we study in this paper. Through detailed modeling of various write-buffering configurations we show that they play a major role in determining the overall performance of a practical HTM system. Our study of both eager and lazy conflict resolution mechanisms in a scalable parallel architecture notes a remarkable convergence of the performance of these two diametrically opposite design points when write buffers are introduced and used well to support the common case. Mitigation of redundant actions, fewer invalidations on abort, latency-hiding and prefetch effects contribute towards reducing execution times for transactions. Shorter transaction durations also imply a lower contention probability, thereby amplifying gains even further. The insights, related to the interplay between buffering mechanisms, system policies and workload characteristics, contained in this paper clearly distinguish gains in performance to be had from write-buffering from those that can be ascribed to HTM policy. We believe that this information would facilitate sound design decisions when incorporating HTMs into parallel architectures.

[1]  Kunle Olukotun,et al.  Transactional coherence and consistency: simplifying parallel hardware and software , 2004, IEEE Micro.

[2]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[3]  Per Stenström,et al.  Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[4]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[5]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[6]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Gurindar S. Sohi,et al.  Speculative Versioning Cache , 2001, IEEE Trans. Parallel Distributed Syst..

[8]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[9]  Kunle Olukotun,et al.  A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[10]  Mateo Valero,et al.  EazyHTM: EAger-LaZY hardware Transactional Memory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[12]  Mateo Valero,et al.  Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[13]  Rachid Guerraoui,et al.  Predicting the Scalability of an STM: A Pragmatic Approach , 2010 .

[14]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[15]  Per Stenström,et al.  Classification and Elimination of Conflicts in Hardware Transactional Memory Systems , 2011, 2011 23rd International Symposium on Computer Architecture and High Performance Computing.

[16]  Sandhya Dwarkadas,et al.  Refereeing conflicts in hardware transactional memory , 2009, ICS.

[17]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[18]  Marc Lupon,et al.  FASTM: A Log-based Hardware Transactional Memory with Fast Abort Recovery , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[19]  Per Stenström,et al.  LV∗: A low complexity lazy versioning HTM infrastructure , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.