Speculative Parallelization of Partially Parallel Loops

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We have previously proposed a framework for their identification. We speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; if the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. In this paper we propose a generalization of our speculative doall parallelization technique, named Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the run-time dependence test itself, i.e., removes the time lost due to incorrect parallel execution. The asymptotic time-complexity is, for fully serial loops, equal to the sequential execution time. We present the base algorithm and an analysis of the different heuristics for its practical application. Some preliminary experimental results on loops from Track will show the performance of this new technique.

[1]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  Manish Gupta,et al.  Techniques for Speculative Run-Time Parallelization of Loops , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[3]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[4]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[5]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[6]  David J. Lilja,et al.  Coarse-grained speculative execution in shared-memory multiprocessors , 1998, ICS '98.

[7]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[8]  Lawrence Rauchwerger,et al.  Parallelizing while loops for multiprocessor systems , 1995, Proceedings of 9th International Parallel Processing Symposium.

[9]  Zhiyuan Li Array privatization for parallel execution of loops , 1992, ICS.

[10]  Peter Rundberg Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors , 2000 .

[11]  Monica S. Lam,et al.  Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.

[12]  Yunheung Paek,et al.  Advanced Program Restructuring for High-Performance Computers with Polaris , 2000 .

[13]  J. Mark Bull,et al.  Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments , 1998, Euro-Par.

[14]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[15]  Whirley DYNA3D: A nonlinear, explicit, three-dimensional finite element code for solid and structural mechanics , 1993 .

[16]  Chuan-Qi Zhu,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Transactions on Software Engineering.