Mitigating the Mismatch between the Coherence Protocol and Conflict Detection in Hardware Transactional Memory

Hardware Transactional Memory (HTM) usually piggybacks onto the cache coherence protocol to detect data access conflicts between transactions. We identify an intrinsic mismatch between the typical coherence scheme and transaction execution, which causes a sizable amount of unnecessary transaction aborts. This pathological behavior is called false aborting and increases the amount of wasted computation and on-chip communication. For the TM applications we studied, 41% of the transactional write requests incur false aborting. To combat false aborting, we propose Predictive Unicast and Notification (PUNO), a novel hardware mechanism to 1) replace the inefficient coherence multicast with a unicast scheme to prevent transactions from being disrupted unnecessarily and 2) restrain transaction polling through proactive notification. PUNO reduces transaction aborts by 61% and network traffic by 32% in workloads representative of future TM applications with a VLSI implementation area overhead of 0.41%.

[1]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[2]  Milo M. K. Martin,et al.  Making the fast case common and the uncommon case simple in unbounded transactional memory , 2007, ISCA '07.

[3]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[4]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[5]  Manuel E. Acacio,et al.  Characterization of Conflicts in Log-Based Transactional Memory (LogTM) , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[6]  Emmett Witchel,et al.  Dependence-aware transactional memory for increased concurrency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[8]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[9]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[10]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[11]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[13]  Per Stenström,et al.  ZEBRA: a data-centric, hybrid-policy hardware transactional memory design , 2011, ICS '11.

[14]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  Ronald G. Dreslinski,et al.  Bloom Filter Guided Transaction Scheduling , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[16]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[17]  Marc Tremblay,et al.  A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[18]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[19]  Ronald G. Dreslinski,et al.  Proactive transaction scheduling for contention management , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Marc Lupon,et al.  A Dynamically Adaptable Hardware Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[22]  Kunle Olukotun,et al.  A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[23]  José González,et al.  Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[24]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[25]  William N. Scherer,et al.  Advanced contention management for dynamic software transactional memory , 2005, PODC '05.

[26]  Marc Lupon,et al.  FASTM: A Log-based Hardware Transactional Memory with Fast Abort Recovery , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[27]  Jeffrey T. Draper,et al.  SEL-TM: Selective Eager-Lazy Management for Improved Concurrency in Transactional Memory , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[28]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[29]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.