FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory

Transactional Memory (TM) has been considered as a promising alternative to existing synchronization operations, which are often the largest stumbling block to unleashing parallelism of applications. Efficient implementations of TM, however, are challenging due to the tension between lowering performance overhead and avoiding unnecessary aborts. In this paper, we present Reachability-based Optimistic Concurrency Control for Transactional Memory (ROCoCoTM), a novel scheme which offloads concurrency control (CC) algorithms, the central building blocks of TM systems, to reconfigurable hardware. To reduce the abort rate, an innovative formalization of mainstream CC algorithms is developed to reveal a common restriction that leads to unnecessary aborts. This restriction is resolved by the ROCoCo algorithm with a centralized validation phase, which can be efficiently pipelined in hardware. Thanks to a high-performance offloading engine implemented in reconfigurable hardware, ROCoCo algorithm results in decreased abort rates and reduced performance overhead. The whole system is implemented on Intel's HARP2 platform and evaluated with the STAMP benchmark suite. Experiments show 1.55x and 8.05x geomean speedup over TinySTM and an HTM based on Intel TSX, respectively. Given the fast-growing deployment of commodity CPU-FPGA platforms, ROCoCoTM paves the way for software programmers to exploit heterogeneous computing resources with a high-level transactional abstraction to effectively extract the parallelism in modern applications.

[1]  T. N. Vijaykumar,et al.  Timetraveler: exploiting acyclic races for optimizing memory race recording , 2010, ISCA.

[2]  Hari Angepat,et al.  Configurable Clouds , 2017, IEEE Micro.

[3]  Josep Torrellas,et al.  OmniOrder: Directory-based conflict serialization of transactions , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[4]  Kunle Olukotun,et al.  An effective hybrid transactional memory system with strong isolation guarantees , 2007, ISCA '07.

[5]  Brian T. Lewis,et al.  Compiler and runtime support for efficient software transactional memory , 2006, PLDI '06.

[6]  Kian-Lee Tan,et al.  Scalable In-Memory Transaction Processing with HTM , 2016, USENIX Annual Technical Conference.

[7]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[8]  Cong Yan,et al.  A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Kunle Olukotun,et al.  Hardware acceleration of transactional memory on commodity systems , 2011, ASPLOS XVI.

[10]  Idit Keidar,et al.  Transactional data structure libraries , 2016, PLDI.

[11]  Maged M. Michael,et al.  RingSTM: scalable transactions with a single atomic instruction , 2008, SPAA '08.

[12]  Torvald Riegel,et al.  Snapshot Isolation for Software Transactional Memory , 2006 .

[13]  Victor Luchangco,et al.  Anatomy of a Scalable Software Transactional Memory , 2009 .

[14]  Changwoo Min,et al.  A scalable ordering primitive for multicore machines , 2018, EuroSys.

[15]  Patrick E. O'Neil,et al.  A read-only transaction anomaly under snapshot isolation , 2004, SGMD.

[16]  William E. Weihl,et al.  Local atomicity properties: modular concurrency control for abstract data types , 1989, TOPL.

[17]  Jason Cong,et al.  A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Daniel Sánchez,et al.  Implementing Signatures for Transactional Memory , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[19]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[20]  Kunle Olukotun,et al.  Generating Configurable Hardware from Parallel Patterns , 2015, International Conference on Architectural Support for Programming Languages and Operating Systems.

[21]  Daniel Sánchez,et al.  Exploiting semantic commutativity in hardware speculation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Kunle Olukotun,et al.  Eigenbench: A simple exploration tool for orthogonal TM characteristics , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[23]  Kunle Olukotun,et al.  Transactional collection classes , 2007, PPOPP.

[24]  Stratis Viglas,et al.  DHTM: Durable Hardware Transactional Memory , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[25]  Yao Wang,et al.  Aggressive pipelining of irregular applications on reconfigurable hardware , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[26]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Valavan Manohararajah,et al.  The Stratix™ 10 Highly Pipelined FPGA Architecture , 2016, FPGA.

[28]  Michael Stonebraker,et al.  Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores , 2014, Proc. VLDB Endow..

[29]  Scott A. Mahlke,et al.  Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.

[30]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[31]  Luís E. T. Rodrigues,et al.  Virtues and limitations of commodity hardware transactional memory , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[32]  Michael F. Spear,et al.  Ordering-Based Semantics for Software Transactional Memory , 2008, OPODIS.

[33]  Yujie Liu,et al.  Transactionalizing legacy code: an experience report using GCC and Memcached , 2014, ASPLOS.

[34]  Torvald Riegel,et al.  Time-Based Software Transactional Memory , 2010, IEEE Transactions on Parallel and Distributed Systems.

[35]  James R. Larus,et al.  Transactional Memory, 2nd edition , 2010, Transactional Memory.

[36]  Steven Trimberger,et al.  Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology , 2015, Proceedings of the IEEE.

[37]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[38]  Keshav Pingali,et al.  What Scalable Programs Need from Transactional Memory , 2017, ASPLOS.

[39]  Emmett Witchel,et al.  Dependence-aware transactional memory for increased concurrency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[40]  T. N. Vijaykumar,et al.  Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies , 2013, ASPLOS '13.

[41]  Michael F. Spear,et al.  NOrec: streamlining STM by abolishing ownership records , 2010, PPoPP '10.

[42]  Quinn Jacobson,et al.  Architectural Support for Software Transactional Memory , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[43]  A. B. Kahn,et al.  Topological sorting of large networks , 1962, CACM.

[44]  Nir Shavit,et al.  Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory , 2015, ASPLOS.

[45]  Martti Penttonen,et al.  A Reliable Randomized Algorithm for the Closest-Pair Problem , 1997, J. Algorithms.

[46]  Rachid Guerraoui,et al.  On the correctness of transactional memory , 2008, PPoPP.

[47]  Christos H. Papadimitriou,et al.  The serializability of concurrent database updates , 1979, JACM.

[48]  Michael J. Cahill Serializable isolation for snapshot databases , 2009, TODS.

[49]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[50]  Thki Hder,et al.  OBSERVATIONS ON OPTIMISTIC CONCURRENCY CONTROL SCHEMES , 2003 .

[51]  David R. Cheriton,et al.  SI-TM: reducing transactional memory abort rates through snapshot isolation , 2014, ASPLOS.

[52]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[53]  David R. Cheriton,et al.  Efficient Correction of Anomalies in Snapshot Isolation Transactions , 2015, ACM Trans. Archit. Code Optim..

[54]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[55]  Dan Grossman,et al.  Enforcing isolation and ordering in STM , 2007, PLDI '07.

[56]  Lu Peng,et al.  Accelerating GPU hardware transactional memory with snapshot isolation , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[57]  Michael F. Spear,et al.  A comprehensive strategy for contention management in software transactional memory , 2009, PPoPP '09.

[58]  Srinivas Devadas,et al.  TicToc: Time Traveling Optimistic Concurrency Control , 2016, SIGMOD Conference.

[59]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[60]  Daniel Sánchez,et al.  Fractal: An execution model for fine-grain nested speculative parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[61]  Derek Chiou,et al.  FPGA-Accelerated Transactional Execution of Graph Workloads , 2017, FPGA.

[62]  J. Gregory Steffan,et al.  Understanding bloom filter intersection for lazy address-set disambiguation , 2011, SPAA '11.

[63]  Peter C. Fishburn,et al.  Interval orders and interval graphs : a study of partially ordered sets , 1985 .

[64]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[65]  David A. Wood,et al.  TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[66]  Anne-Marie Kermarrec,et al.  ProteusTM: Abstraction Meets Performance in Transactional Memory , 2016, ASPLOS.

[67]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[68]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.