Transactional conflict decoupling and value prediction

This paper explores data speculation for improving the performance of Hardware Transactional Memory (HTM). We attempt to reduce transactional conflicts by decoupling them from cache coherence conflicts; many HTMs do not distinguish between transactional conflicts and coherence conflicts, leading to false transactional conflicts. We also attempt to mitigate the effects of coherence conflicts by using value prediction in transactions. We show that coherence decoupling and value prediction in transactions complement each other, because they both speculate on data in ways that are infeasible in the absence of HTM support. As a demonstration of how data speculation can improve performance, we introduce DPTM, a best-effort HTM that mitigates the effects of false sharing at the cache line level. DPTM does not alter the underlying cache coherence protocol, and requires only minor, processor-local, modifications. We evaluate DPTM against a baseline best-effort HTM, and compare it with data restructuring by padding, the most commonly used method to avoid false sharing. Our experiments show that DPTM can dramatically improve performance in the presence of false sharing without degrading performance in its absence, and consistently performs better than restructuring by padding.

[1]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[2]  James R. Larus,et al.  Transactional Memory, 2nd edition , 2010, Transactional Memory.

[3]  K. Olukotun,et al.  Transactional Memory Coherence and Consistency ( TCC ) , 2004 .

[4]  Martin Rinard,et al.  Efficient Object-Based Software Transactions , 2005 .

[5]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[6]  James R. Goodman,et al.  Transactional Value Prediction , 2009 .

[7]  Jaehyuk Huh,et al.  Coherence decoupling: making use of incoherence , 2004, ASPLOS XI.

[8]  Gregory T. Byrd,et al.  Extending concurrency of transactional memory programs by using value prediction , 2009, CF '09.

[9]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[10]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[11]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[12]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[13]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[14]  Milo M. K. Martin,et al.  RETCON: transactional repair without replay , 2010, ISCA '10.

[15]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[16]  Laxmi N. Bhuyan,et al.  A dynamic cache sub-block design to reduce false sharing , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[17]  Bradley C. Kuszmaul,et al.  Unbounded Transactional Memory , 2005, HPCA.

[18]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[19]  David A. Wood,et al.  TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[20]  Mikko H. Lipasti,et al.  Silent stores for free , 2000, MICRO 33.

[21]  David Eisenstat,et al.  Lowering the Overhead of Nonblocking Software Transactional Memory , 2006 .

[22]  Josep Torrellas,et al.  Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[23]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[24]  Mateo Valero Cortés,et al.  Hybrid transactional memory to accelerate safe lock-based transactions , 2008 .

[25]  Mikko H. Lipasti,et al.  Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing , 2001, MICRO.

[26]  Antonia Zhai,et al.  Improving value communication for thread-level speculation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[27]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[28]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[29]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[30]  Dan Grossman,et al.  The transactional memory / garbage collection analogy , 2007, OOPSLA.

[31]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[32]  Ravi Rajwar,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[33]  Craig B. Zilles,et al.  Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[34]  Keir Fraser,et al.  A Practical Multi-word Compare-and-Swap Operation , 2002, DISC.

[35]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[36]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[37]  Michael L. Scott,et al.  False sharing and its effect on shared memory performance , 1993 .

[38]  Hsien-Hsin S. Lee,et al.  Kicking the tires of software transactional memory: why the going gets tough , 2008, SPAA '08.

[39]  Mateo Valero,et al.  EazyHTM: EAger-LaZY hardware Transactional Memory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Mikko H. Lipasti,et al.  Temporally silent stores , 2002, ASPLOS X.

[41]  Kunle Olukotun,et al.  Characterization of TCC on chip-multiprocessors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[42]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[43]  Satish Narayanasamy,et al.  Unbounded page-based transactional memory , 2006, ASPLOS XII.

[44]  Brian T. Lewis,et al.  Compiler and runtime support for efficient software transactional memory , 2006, PLDI '06.

[45]  James R. Larus,et al.  Transactional memory , 2008, CACM.

[46]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[47]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[48]  Marek Olszewski,et al.  JudoSTM: A Dynamic Binary-Rewriting Approach to Software Transactional Memory , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[49]  William N. Scherer,et al.  A Scalable Elimination-based Exchange Channel , 2005 .

[50]  Josep Torrellas,et al.  Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.

[51]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[52]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[53]  Emmett Witchel,et al.  Maximum benefit from a minimal HTM , 2009, ASPLOS.

[54]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[55]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[56]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[57]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[58]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[59]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[60]  James R. Larus,et al.  Transactional Memory (Synthesis Lectures on Computer Architecture) , 2007 .

[61]  Donald E. Porter,et al.  MetaTM/TxLinux: Transactional Memory for an Operating System , 2008, IEEE Micro.

[62]  Mark Moir,et al.  The adaptive transactional memory test platform: a tool for experimenting with transactional code for rock (poster) , 2008, SPAA '08.