Hardware Support for Relaxed Concurrency Control in Transactional Memory

Today's transactional memory systems implement the two-phase-locking (2PL) algorithm which aborts transactions every time a conflict happens. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in applications with high contention by increasing the rate of aborts. More relaxed algorithms that can commit conflicting transactions have recently been shown to provide better concurrency both in software and hardware. However, existing approaches for implementing such algorithms increase latencies of transactional operations, require complex hardware support and alter standard cache coherence protocols. In this paper, we discuss how a relaxed concurrency control algorithm can be efficiently implemented in hardware. More specifically, we use a technique which approximates conflict-serializability and implement it in hardware on top a base hardware transactional memory system that provides support for isolation and conflict detection. Our novel hardware scheme is based on recording conflicts as they occur, instead of aborting transactions. Transactions serialize at commit time according to these conflicts by sending broadcast messages. Our evaluation of this hardware scheme using a simulator and standard benchmarks shows that it captures the benefits of conflict-serializability. Applications with long transactions and high contention benefit the most, abort rates are reduced up to 7.2 times and the performance is improved up to 66%. We argue that this improvement comes with little additional hardware complexity and requires no changes to the transactional programming model.

[1]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  Emmett Witchel,et al.  Dependence-aware transactional memory for increased concurrency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[3]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[4]  Torvald Riegel,et al.  Snapshot Isolation for Software Transactional Memory , 2006 .

[5]  Maurice Herlihy,et al.  Committing conflicting transactions in an STM , 2009, PPoPP '09.

[6]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[8]  Torvald Riegel,et al.  A Lazy Snapshot Algorithm with Eager Validation , 2006, DISC.

[9]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[10]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[11]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[12]  U. Aydonat,et al.  Hardware Support For Serializable Transactions : A Study of Feasibility and Performance , 2009 .

[13]  U. Aydonat Serializability of Transactions in Software Transactional Memory , 2008 .

[14]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[16]  Kunle Olukotun,et al.  Transactional collection classes , 2007, PPOPP.

[17]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.