Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

This paper proposes an optimal algorithm for detecting fine or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular it is optimal for direction vectors, which generalizes Wolf and Lam's algorithm (1991) to the case of several statements. It relies on a dependence uniformization process and an parallelization techniques related to system of uniform recurrence equations.

[1]  Paul Feautrier,et al.  Fuzzy Array Dataflow Analysis , 1997, J. Parallel Distributed Comput..

[2]  Frédéric Vivien,et al.  Revisiting the Decomposition of Karp, Miller and Winograd , 1995, Parallel Process. Lett..

[3]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1996 .

[4]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[5]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[6]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[7]  Michel Minoux,et al.  Graphs and Algorithms , 1984 .

[8]  Paul Feautrier,et al.  Fuzzy array dataflow analysis , 1995, PPOPP '95.

[9]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[10]  Yves Robert,et al.  Plugging Anti and Output Dependence Removal Techniques Into Loop Parallelization Algorithm , 1997, Parallel Comput..

[11]  Michael Wolfe,et al.  The Tiny Loop Restructuring Research Tool , 1991, ICPP.

[12]  Alain Darte,et al.  Automatic Parallelization Based on Multi-Dimensional Scheduling , 1994 .

[13]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[14]  Frédéric Vivien,et al.  Revisiting the decomposition of Karp, Miller and Winograd , 1995, Proceedings The International Conference on Application Specific Array Processors.

[15]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[16]  Patrick Le Gouëslier d'Argence An Asymptotically Optimal Affine Schedule on Bounded Convex Polyhedric Domains , 1996, Euro-Par, Vol. II.

[17]  FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[18]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[19]  Thomas Kailath,et al.  Regular iterative algorithms and their implementation on processor arrays , 1988, Proc. IEEE.

[20]  A. Darte,et al.  A classification of nested loops parallelization algorithms , 1995, Proceedings 1995 INRIA/IEEE Symposium on Emerging Technologies and Factory Automation. ETFA'95.

[21]  Gregory F. Sullivan,et al.  Detecting cycles in dynamic graphs in polynomial time , 1988, STOC '88.

[22]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[23]  Thomas Kailath,et al.  Derivation, extensions and parallel implementation of regular iterative algorithms , 1989 .

[24]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[25]  Utpal Banerjee,et al.  A theory of loop permutations , 1990 .

[26]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[27]  Ken Kennedy,et al.  PFC: A Program to Convert Fortran to Parallel Form , 1982 .

[28]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[29]  Yves Robert,et al.  Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric , 1995, J. Parallel Distributed Comput..

[30]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[31]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[32]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..

[33]  A. Darte Aane-by-statement Scheduling of Uniform and Aane Loop Nests over Parametric Domains , 1995 .

[34]  Frédéric Vivien,et al.  On the Optimality of Allen and Kennedy's Algorithm for Parallelism Extraction in Nested Loops , 1996, Parallel Algorithms Appl..