Partitioning and mapping of nested loops for linear array multicomputers

In distributed-memory multicomputers, minimizing interprocessor communication is the key to the efficient execution of parallel programs. In order to reduce the amount of communication overhead, parallel programs on multicomputers must be carefully scheduled by parallelizing compilers. This paper proposes some compilation techniques for partitioning and mapping nested loops with constant data dependences onto linear array multicomputers. First, a systematic partition strategy is proposed to project ann-dimensional computational structure, representing ann-nested loop, onto a line to form a one-dimensional projected structure with low communication overhead. Then, a mapping algorithm is proposed for mapping the partitioned loops onto linear arrays in a way that balances the workload and minimizes the communication cost among processors. Finally, parallel execution codes can be automatically generated for such linear array multicomputers.

[1]  J. Ramanujam,et al.  Task allocation onto a hypercube by recursive mincut bipartitioning , 1990, C3P.

[2]  Chung-Ta King,et al.  Pipelined Data Parallel Algorithms-II: Design , 1990, IEEE Trans. Parallel Distributed Syst..

[3]  Jang-Ping Sheu,et al.  On the Parallelism of Nested For-Loops Using Index Shift Method , 1990, ICPP.

[4]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[5]  Jang-Ping Sheu,et al.  Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[6]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[7]  Weijia Shang,et al.  Independent Partitioning of Algorithms with Uniform Dependencies , 1992, IEEE Trans. Computers.

[8]  T. KingC.,et al.  Pipelined Data Parallel Algorithms-I , 1990 .

[9]  Rajiv Gupta Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems , 1992, IEEE Trans. Parallel Distributed Syst..

[10]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[11]  Utpal Banerjee,et al.  Time and Parallel Processor Bounds for Fortran-Like Loops , 1979, IEEE Transactions on Computers.

[12]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[13]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[14]  Weijia Shang,et al.  Independent Partitioning of Algorithms With Uniform Data Dependencies , 1988, International Conference on Parallel Processing.

[15]  Stephen H. Friedberg,et al.  Linear Algebra , 2018, Computational Mathematics with SageMath.

[16]  J. Ramanujam,et al.  Tiling of Iteration Spaces for Multicomputers , 1990, ICPP.

[17]  J. Ramanujam,et al.  A methodology for parallelizing programs for multicomputers and complex memory multiprocessors , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[18]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Index Sets in DO Loops with Constant Dependence Vectors , 1989, ICPP.

[19]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[20]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[21]  Shahid H. Bokhari,et al.  Partitioning Problems in Parallel, Pipelined, and Distributed Computing , 1988, IEEE Trans. Computers.

[22]  P. Sadayappan,et al.  Nearest-Neighbor Mapping of Finite Element Graphs onto Processor Meshes , 1987, IEEE Transactions on Computers.

[23]  Mi Lu,et al.  A Solution of the Cache Ping-Pong Problem in Multiprocessor Systems , 1992, J. Parallel Distributed Comput..

[24]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.