Parallelizing while loops for multiprocessor systems

Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Because these types of loops arise frequently in practice, we have developed techniques that can automatically transform them for parallel execution. We succeed in parallelizing loops involving linked lists traversals-something that has not been done before. This is an important problem since linked list traversals arise frequently in loops with irregular access patterns, such as sparse matrix computations. The methods can even be applied to loops whose data dependence relations cannot be analyzed at compile-time. Experimental results on loops from the PERFECT Benchmarks and sparse matrix packages show that these techniques can yield significant speedups.<<ETX>>

[1]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[2]  Ted G. Lewis,et al.  Parallelizing WHILE Loops , 1990, ICPP.

[3]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[4]  Ken Kennedy,et al.  Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[6]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[7]  David J. Kuck,et al.  Practical Parallel Band Triangular System Solvers , 1978, TOMS.

[8]  Lawrence Rauchwerger,et al.  The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization , 1994, ICS '94.

[9]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[10]  Michael Wolfe Doany: Not Just Another Parallel Loop , 1992, LCPC.

[11]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[12]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[13]  Harry A. G. Wijshoff,et al.  A Large-Grain Parallel Sparse System Solver , 1989, PPSC.

[14]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[15]  Monica S. Lam,et al.  Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.

[16]  Panagiotis Takis Metaxas Parallel algorithms for graph problems , 1992 .

[17]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[18]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[19]  David A. Padua,et al.  Array privatization for shared and distributed memory machines (extended abstract) , 1993, SIGP.

[20]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[21]  Iain S. Duff,et al.  MA28 --- A set of Fortran subroutines for sparse unsymmetric linear equations , 1980 .

[22]  Williams Ludwell Harrison,et al.  The interprocedural analysis and automatic parallelization of Scheme programs , 1990, LISP Symb. Comput..

[23]  K. Gallivan,et al.  MCSPARSE A parallel sparse unsymmetric linear system solver , 1991 .

[24]  Scott A. Mahlke,et al.  Sentinel scheduling for VLIW and superscalar processors , 1992, ASPLOS V.

[25]  Zhiyuan Li Array privatization for parallel execution of loops , 1992, ICS.