A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces

Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of non-rectangular iteration spaces – e.g. triangular, trapezoidal iteration spaces – has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented.

[1]  Constantine D. Polychronopoulos Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.

[2]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[3]  Constantine D. Polychronopoulos,et al.  Symbolic analysis for parallelizing compilers , 1996, TOPL.

[4]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[5]  Alexander V. Veidenbaum,et al.  EFFECTS OF PROGRAM RESTRUCTURING, ALGORITHM CHANGE, AND ARCHITECTURE CHOICE ON PROGRAM PERFORMANCE. , 1984 .

[6]  Michael F. P. O'Boyle,et al.  Load Balancing of Parallel Affine Loops by Unimodular Transformations , 1992 .

[7]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[8]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[9]  Yves Robert,et al.  Mapping affine loop nests: new results , 1995, HPCN Europe.

[10]  Wen-mei W. Hwu,et al.  Executing Nested Parallel Loops on Shared-Memory Multiprocessors , 1992, ICPP.

[11]  R. Jain,et al.  Numerical Methods for Scientific and Engineering Computation , 1985 .

[12]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[13]  Ron Cytron,et al.  Interprocedural dependence analysis and parallelization , 1986, SIGP.

[14]  David A. Padua,et al.  Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[15]  Vincent Loechner,et al.  Parametric Analysis of Polyhedral Iteration Spaces , 1998, J. VLSI Signal Process..

[16]  Nectarios Koziris,et al.  Mapping nested loops onto distributed memory multiprocessors , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[17]  Constantine D. Polychronopoulos,et al.  Symbolic Program Analysis and Optimization for Parallelizing Compilers , 1992, LCPC.

[18]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[19]  G. H. Barnes,et al.  A controllable MIMD architecture , 1986 .

[20]  Steven Mark Carr,et al.  Memory-hierarchy management , 1993 .

[21]  Utpal Banerjee Loop Parallelization , 1994, Springer US.

[22]  Alan Weiss,et al.  Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[23]  Yves Robert,et al.  Mapping Affine Loop Nests , 1996, Parallel Comput..

[24]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..

[25]  Rizos Sakellariou,et al.  On the Quest for Perfect Load Balance in Loop-Based Parallel Computations , 1996 .