Speculative parallelization using software multi-threaded transactions

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative techniques, speculative parallelism appears to be the key to continuing this trend for general-purpose applications. Recently-proposed code parallelization techniques, such as those by Bridges et al. and by Thies et al., demonstrate scalable performance on multiple cores by using speculation to divide code into atomic units (transactions) that span multiple threads in order to expose data parallelism. Unfortunately, most software and hardware Thread-Level Speculation (TLS) memory systems and transactional memories are not sufficient because they only support single-threaded atomic units. Multi-threaded Transactions (MTXs) address this problem, but they require expensive hardware support as currently proposed in the literature. This paper proposes a Software MTX (SMTX) system that captures the applicability and performance of hardware MTX, but on existing multicore machines. The SMTX system yields a harmonic mean speedup of 13.36x on native hardware with four 6-core processors (24 cores in total) running speculatively parallelized applications.

[1]  Matthew J. Bridges,et al.  The velocity compiler: extracting efficient multicore execution from legacy sequential codes , 2008 .

[2]  Josep Torrellas,et al.  Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors , 2005, TACO.

[3]  Scott A. Mahlke,et al.  Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.

[4]  Alan Mycroft,et al.  Software thread-level speculation: an optimistic library implementation , 2008, IWMSE '08.

[5]  Gurindar S. Sohi,et al.  Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[6]  James R. Larus,et al.  Transactional Memory (Synthesis Lectures on Computer Architecture) , 2007 .

[7]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.

[8]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[9]  Marek Olszewski,et al.  JudoSTM: A Dynamic Binary-Rewriting Approach to Software Transactional Memory , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[10]  Easwaran Raman,et al.  Spice: speculative parallel iteration chunk execution , 2008, CGO '08.

[11]  David I. August,et al.  Intelligent speculation for pipelined multithreading , 2008 .

[12]  Rajiv Gupta,et al.  Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[13]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA 2009.

[14]  Yun Zhang,et al.  Revisiting the Sequential Programming Model for Multi-Core , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[16]  Chen Ding,et al.  Fast Track: A Software System for Speculative Program Optimization , 2009, 2009 International Symposium on Code Generation and Optimization.

[17]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[18]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[19]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[20]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Martín Abadi,et al.  Transactional memory with strong atomicity using off-the-shelf memory protection hardware , 2009, PPoPP '09.

[22]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[23]  Brian T. Lewis,et al.  Compiler and runtime support for efficient software transactional memory , 2006, PLDI '06.

[24]  Per Stenström,et al.  An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors , 2001, J. Instr. Level Parallelism.

[25]  Scott A. Mahlke,et al.  Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[26]  Manoj Franklin,et al.  A general compiler framework for speculative multithreading , 2002, SPAA '02.

[27]  Chen Ding,et al.  Software behavior oriented parallelization , 2007, PLDI '07.

[28]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[29]  Easwaran Raman,et al.  Parallelization techniques with improved dependence handling , 2009 .

[30]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[31]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.