A novel approach for partitioning iteration spaces with variable densities

Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with rectangular iteration spaces, the problem of partitioning non-rectangular iteration spaces --- e.g., triangular, trapezoidal iteration spaces --- with variable densities has not been addressed so far to the best of our knowledge. In this paper, we present a mathematical model for partitioning N-dimensional non-rectangular iteration spaces with variable densities. We present a unimodular loop transformation and a geometric approach for partitioning an iteration space along an axis corresponding to the outermost loop across a given number of processors to achieve near-optimal performance, i.e., to achieve near-optimal load balance across different processors. We present a case study to illustrate the effectiveness of our approach.

[1]  Rizos Sakellariou,et al.  On the Quest for Perfect Load Balance in Loop-Based Parallel Computations , 1996 .

[2]  Nectarios Koziris,et al.  Mapping nested loops onto distributed memory multiprocessors , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[3]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[4]  Alexandru Nicolau,et al.  A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces , 2004, LCPC.

[5]  David A. Padua,et al.  Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[6]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[7]  Alexander V. Veidenbaum,et al.  EFFECTS OF PROGRAM RESTRUCTURING, ALGORITHM CHANGE, AND ARCHITECTURE CHOICE ON PROGRAM PERFORMANCE. , 1984 .

[8]  Constantine D. Polychronopoulos Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.

[9]  Thomas Ertl,et al.  Computer Graphics - Principles and Practice, 3rd Edition , 2014 .

[10]  M. Carter Computer graphics: Principles and practice , 1997 .

[11]  Michael Wolfe,et al.  Advanced Loop Interchanging , 1986, ICPP.

[12]  Constantine D. Polychronopoulos,et al.  Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.

[13]  Constantine D. Polychronopoulos,et al.  Symbolic analysis for parallelizing compilers , 1996, TOPL.

[14]  Constantine D. Polychronopoulos,et al.  Symbolic Program Analysis and Optimization for Parallelizing Compilers , 1992, LCPC.

[15]  Utpal Banerjee,et al.  A theory of loop permutations , 1990 .

[16]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..

[17]  Alan Weiss,et al.  Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[18]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[19]  Yves Robert,et al.  Mapping Affine Loop Nests , 1996, Parallel Comput..

[20]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[21]  G. H. Barnes,et al.  A controllable MIMD architecture , 1986 .

[22]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[23]  Yves Robert,et al.  Mapping affine loop nests: new results , 1995, HPCN Europe.

[24]  Wen-mei W. Hwu,et al.  Executing Nested Parallel Loops on Shared-Memory Multiprocessors , 1992, ICPP.

[25]  Michael F. P. O'Boyle,et al.  Load Balancing of Parallel Affine Loops by Unimodular Transformations , 1992 .

[26]  P. Anninos Computational Cosmology: From the Early Universe to the Large Scale Structure , 1998, Living reviews in relativity.