Partitioning and Labeling of Loops by Unimodular Transformations

A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained by a direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n/sup 2/) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive. >

[1]  Pen-Chung Yew,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..

[2]  Alexandru Nicolau,et al.  Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[3]  Zhiyu Shen,et al.  An Empirical Study of Fortran Programs for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[4]  David A. Padua,et al.  Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[5]  Ron Cytron,et al.  Limited Processor Scheduling of Doacross Loops , 1987, ICPP.

[6]  Weijia Shang,et al.  Independent Partitioning of Algorithms With Uniform Data Dependencies , 1988, International Conference on Parallel Processing.

[7]  David B. Loveman,et al.  Program Improvement by Source-to-Source Transformation , 1977, J. ACM.

[8]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[9]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Index Sets in DO Loops with Constant Dependence Vectors , 1989, ICPP.

[10]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[11]  Weijia Shang,et al.  Independent Partitioning of Algorithms with Uniform Dependencies , 1992, IEEE Trans. Computers.

[12]  Constantine D. Polychronopoulos Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.

[13]  J. Edmonds Systems of distinct representatives and linear algebra , 1967 .

[14]  Dan I. Moldovan,et al.  Parallelism detection and transformation techniques useful for VLSI algorithms , 1985, J. Parallel Distributed Comput..

[15]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[16]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[17]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[18]  Zhiyuan Li,et al.  On Reducing Data Synchronization in Multiprocessed Loops , 1987, IEEE Transactions on Computers.

[19]  Ronald Gary Cytron Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing) , 1984 .

[20]  José A. B. Fortes,et al.  Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/ Systolic Systems , 1988 .

[21]  David Alejandro Padua Haiek Multiprocessors: discussion of some theoretical and practical problems , 1980 .

[22]  Peiyi Tang,et al.  Dynamic Processor Self-Scheduling for General Parallel Nested Loops , 1987, IEEE Trans. Computers.

[23]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .

[24]  Duncan H. Lawrie,et al.  On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.

[25]  Ravi Kannan,et al.  Polynomial Algorithms for Computing the Smith and Hermite Normal Forms of an Integer Matrix , 1979, SIAM J. Comput..

[26]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[27]  Constantine D. Polychronopoulos Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design , 1988, IEEE Trans. Computers.

[28]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[29]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[30]  Gordon H. Bradley,et al.  Algorithm and bound for the greatest common divisor of n integers , 1970, CACM.

[31]  David A. Padua,et al.  Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.