Runtime automatic speculative parallelization

We present Runtime Automatic Speculative Parallelization (RASP), a technique for the dynamic extraction of speculative threads from a running application in a user-transparent fashion. By leveraging the idle cores in a CMP to analyze, optimize, and participate in the execution of a running sequential program, RASP enables a collection of simpler cores to achieve sequential performance on par with a significantly more complex core. In contrast to other systems for automatic speculative parallelization, RASP uses dynamic binary translation to optimize applications on-the-fly, without any need for recompilation or source code. RASP achieves these speedups without relying on special-purpose hardware support; RASP's dynamic profiling uses a clever variation on conventional performance monitoring, while RASP's speculative execution relies on the same simple hardware support for speculation that has been proposed for simplifying parallel programming. On a simulated cluster of four in-order cores, RASP accelerates SPEC2006 integer benchmarks by an average of 49%, with promising results for scientific and multimedia workloads as well.

[1]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[2]  G.S. Sohi,et al.  Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3]  Josep Torrellas,et al.  Tradeoffs in buffering memory state for thread-level speculation in multiprocessors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[4]  Clark Verbrugge,et al.  SableSpMT: a software framework for analysing speculative multithreading in Java , 2005, PASTE '05.

[5]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Josep Torrellas,et al.  Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Gurindar S. Sohi,et al.  Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[9]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[10]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[11]  Kunle Olukotun,et al.  The Jrpm system for dynamically parallelizing Java programs , 2003, ISCA '03.

[12]  Craig B. Zilles,et al.  Hardware atomicity for reliable software speculation , 2007, ISCA '07.

[13]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[14]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  Milind Girkar,et al.  Tight analysis of the performance potential of thread speculation using spec CPU 2006 , 2007, PPOPP.

[16]  Rudolf Eigenmann,et al.  Speculative thread decomposition through empirical optimization , 2007, PPoPP.

[17]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[18]  Xiao-Feng Li,et al.  Software Value Prediction for Speculative Parallel Threaded Computations , 2003 .

[19]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[20]  Satoshi Matsushita,et al.  Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[21]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[22]  Keith D. Cooper,et al.  Value-driven redundancy elimination , 1996 .

[23]  Chen Ding,et al.  Fast Track: A Software System for Speculative Program Optimization , 2009, 2009 International Symposium on Code Generation and Optimization.

[24]  Manoj Franklin,et al.  A general compiler framework for speculative multithreading , 2002, SPAA '02.

[25]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.

[26]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[27]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[28]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[29]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[30]  Andreas Moshovos,et al.  Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.

[31]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[32]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[33]  Antonia Zhai,et al.  Dynamic performance tuning for speculative threads , 2009, ISCA '09.

[34]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.