Principles of Speculative Run-Time Parallelization

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We advocate a novel framework for the identification of parallel loops. It speculatively executes a loop as a doall and applies a fully parallel data dependence test to check for any unsatisfied data dependencies; if the test fails, then the loop is re-executed serially. We will present the principles of the design and implementation of a compiler that employs both run-time and static techniques to parallelize dynamic applications. Run-time optimizations always represent a tradeoff between a speculated potential benefit and a certain (sure) overhead that must be paid. We will introduce techniques that take advantage of classic compiler methods to reduce the cost of run-time optimization thus tilting the outcome of speculation in favor of significant performance gains. Experimental results from the PERFECT, SPEC and NCSA Benchmark suites show that these techniques yield speedups not obtainable by any other known method.

[1]  Lawrence Rauchwerger,et al.  The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization , 1994, ICS '94.

[2]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[3]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[4]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[5]  Lawrence Rauchwerger,et al.  Parallelizing while loops for multiprocessor systems , 1995, Proceedings of 9th International Parallel Processing Symposium.

[6]  Yunheung Paek,et al.  Simplification of array access patterns for compiler optimizations , 1998, PLDI.

[7]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[8]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[9]  N. S. Barnett,et al.  Private communication , 1969 .

[10]  David A. Padua,et al.  Array privatization for shared and distributed memory machines (extended abstract) , 1993, SIGP.

[11]  Whirley DYNA3D: A nonlinear, explicit, three-dimensional finite element code for solid and structural mechanics , 1993 .

[12]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[13]  Lawrence Rauchwerger,et al.  Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.

[14]  Harry Berryman,et al.  A manual for PARTI runtime primitives , 1990 .

[15]  Monica S. Lam,et al.  Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.

[16]  Zhiyuan Li Array privatization for parallel execution of loops , 1992, International Conference on Supercomputing.

[17]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[18]  Ken Kennedy,et al.  The ParaScope parallel programming environment , 1993, Proc. IEEE.

[19]  Pen-Chung Yew,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..

[20]  Thomas R. Lawrence,et al.  IMPLEMENTATION OF RUN TIME TECHNIQUES IN THE POLARIS FORTRAN RESTRUCTURER , 1996 .

[21]  Utpal Banerjee Loop Parallelization , 1994, Springer US.

[22]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[23]  Lawrence Rauchwerger,et al.  Run-time parallelization: A framework for parallel computation , 1995 .