论文信息 - A novel approach for partitioning iteration spaces with variable densities

A novel approach for partitioning iteration spaces with variable densities

Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with rectangular iteration spaces, the problem of partitioning non-rectangular iteration spaces --- e.g., triangular, trapezoidal iteration spaces --- with variable densities has not been addressed so far to the best of our knowledge. In this paper, we present a mathematical model for partitioning N-dimensional non-rectangular iteration spaces with variable densities. We present a unimodular loop transformation and a geometric approach for partitioning an iteration space along an axis corresponding to the outermost loop across a given number of processors to achieve near-optimal performance, i.e., to achieve near-optimal load balance across different processors. We present a case study to illustrate the effectiveness of our approach.

[1] Rizos Sakellariou,et al. On the Quest for Perfect Load Balance in Loop-Based Parallel Computations , 1996 .

[2] Nectarios Koziris,et al. Mapping nested loops onto distributed memory multiprocessors , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[3] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.

[4] Alexandru Nicolau,et al. A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces , 2004, LCPC.

[5] David A. Padua,et al. Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[6] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .

[7] Alexander V. Veidenbaum,et al. EFFECTS OF PROGRAM RESTRUCTURING, ALGORITHM CHANGE, AND ARCHITECTURE CHOICE ON PROGRAM PERFORMANCE. , 1984 .

[8] Constantine D. Polychronopoulos. Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.

[9] Thomas Ertl,et al. Computer Graphics - Principles and Practice, 3rd Edition , 2014 .

[10] M. Carter. Computer graphics: Principles and practice , 1997 .

[11] Michael Wolfe,et al. Advanced Loop Interchanging , 1986, ICPP.

[12] Constantine D. Polychronopoulos,et al. Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.

[13] Constantine D. Polychronopoulos,et al. Symbolic analysis for parallelizing compilers , 1996, TOPL.

[14] Constantine D. Polychronopoulos,et al. Symbolic Program Analysis and Optimization for Parallelizing Compilers , 1992, LCPC.

[15] Utpal Banerjee,et al. A theory of loop permutations , 1990 .

[16] Erik H. D'Hollander,et al. Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..

[17] Alan Weiss,et al. Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[18] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.

[19] Yves Robert,et al. Mapping Affine Loop Nests , 1996, Parallel Comput..

[20] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .

[21] G. H. Barnes,et al. A controllable MIMD architecture , 1986 .

[22] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[23] Yves Robert,et al. Mapping affine loop nests: new results , 1995, HPCN Europe.

[24] Wen-mei W. Hwu,et al. Executing Nested Parallel Loops on Shared-Memory Multiprocessors , 1992, ICPP.

[25] Michael F. P. O'Boyle,et al. Load Balancing of Parallel Affine Loops by Unimodular Transformations , 1992 .

[26] P. Anninos. Computational Cosmology: From the Early Universe to the Large Scale Structure , 1998, Living reviews in relativity.