A classification of nested loops parallelization algorithms

Compares three nested loops parallelization algorithms (Allen and Kennedy's algorithm, Wolf and Lam's algorithm and Darte and Vivien's algorithm) that use different representations of distance vectors as input. The authors identify the concepts that make them similar or different. The authors study the optimality of each with respect to the dependence analysis it uses. The authors propose well-chosen examples that illustrate the power and limitations of the three algorithms. This study permits the authors to identify which algorithm is the most suitable for a given representation of dependences.

[1]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..

[2]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[3]  Utpal Banerjee,et al.  A theory of loop permutations , 1990 .

[4]  Yves Robert,et al.  Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..

[5]  Frédéric Vivien,et al.  Revisiting the decomposition of Karp, Miller and Winograd , 1995, Proceedings The International Conference on Application Specific Array Processors.

[6]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[7]  Alain Darte,et al.  Automatic Parallelization Based on Multi-Dimensional Scheduling , 1994 .

[8]  Frédéric Vivien,et al.  Revisiting the Decomposition of Karp, Miller and Winograd , 1995, Parallel Process. Lett..

[9]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[10]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[11]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[12]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[13]  Paul Feautrier,et al.  Construction of Do Loops from Systems of Affine Constraints , 1995, Parallel Process. Lett..

[14]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[15]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[16]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[17]  Jingling Xue Automating Non-Unimodular Loop Transformations for Massive Parallelism , 1994, Parallel Comput..

[18]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[19]  Jean-Francois Collard Code Generation in Automatic Parallelizers , 1994, Applications in Parallel and Distributed Computing.