Safe parallel programming using dynamic dependence hints

Speculative parallelization divides a sequential program into possibly parallel tasks and permits these tasks to run in parallel if and only if they show no dependences with each other. The parallelization is safe in that a speculative execution always produces the same output as the sequential execution. In this paper, we present the dependence hint, an interface for a user to specify possible dependences between possibly parallel tasks. Dependence hints may be incorrect or incomplete but they do not change the program output. The interface extends Cytron's do-across and recent OpenMP ordering primitives and makes them safe and safely composable. We use it to express conditional and partial parallelism and to parallelize large-size legacy code. The prototype system is implemented as a software library. It is used to improve performance by nearly 10 times on average on current multicore machines for 8 programs including 5 SPEC benchmarks.

[1]  Satish Narayanasamy,et al.  DoublePlay: Parallelizing Sequential Logging and Replay , 2011, TOCS.

[2]  Lixia Liu,et al.  Improving parallelism and locality with asynchronous algorithms , 2010, PPoPP '10.

[3]  Brian Demsky,et al.  OoOJava: software out-of-order execution , 2011, PPoPP '11.

[4]  Xipeng Shen,et al.  Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization , 2008, 2008 37th International Conference on Parallel Processing.

[5]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[6]  Chen Ding,et al.  Software behavior oriented parallelization , 2007, PLDI '07.

[7]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA 2009.

[8]  Luis Ceze,et al.  Implicit parallelism with ordered transactions , 2007, PPoPP.

[9]  Rudolf Eigenmann,et al.  Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.

[10]  Michael A. Bender,et al.  On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs , 2004, SPAA '04.

[11]  Arun Raman,et al.  Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.

[12]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[13]  Sebastian Burckhardt,et al.  Concurrent programming with revisions and isolation types , 2010, OOPSLA.

[14]  Suresh Jagannathan,et al.  Safe futures for Java , 2005, OOPSLA '05.

[15]  Larry Carter,et al.  Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.

[16]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[17]  David G. Wonnacott,et al.  Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.

[18]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[19]  Antonia Zhai,et al.  Compiler and hardware support for reducing the synchronization of speculative threads , 2008, TACO.

[20]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[21]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[22]  Rajiv Gupta,et al.  Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[23]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[24]  Rajiv Gupta,et al.  SpiceC: scalable parallelism via implicit copying and explicit commit , 2011, PPoPP '11.

[25]  Gregory R. Andrews,et al.  Gossamer: A Lightweight Approach to Using Multicore Machines , 2010, 2010 39th International Conference on Parallel Processing.

[26]  John M. Mellor-Crummey,et al.  On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[27]  Chao Zhang,et al.  Continuous speculative program parallelization in software , 2010, PPoPP '10.

[28]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[29]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[30]  Ron Cytron Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[31]  Victor Luchangco,et al.  Transaction communicators: enabling cooperation among concurrent transactions , 2011, PPoPP '11.

[32]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[33]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.