Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

This paper is concerned with the design of efficient algorithms for generating global name-space communication sets based on execution of array assignment statements for arbitrary strides and block sizes on distributed-memory parallel computers. We will present a hybrid approach, which combines the advantages of the set-theoretic method and the integer lattice method for generating communication sets. When block sizes are extremely small or large, a cyclic-based or a row-wise set-theoretic method is used. For other cases when block sizes are medium, we propose a new integer lattice method, in which data in each local block are treated as a unit. The first virtual referenced element in each virtual referenced local block can be generated efficiently by using an integer lattice method, in which the left boundary of index domain in each processing element is extended for this purpose. Then, the physical referenced elements in each physical referenced local block can be determined by the intersection of two closed forms, whose result is also a closed form. Because the cost of generating indices for packing and unpacking messages at the sending and receiving ends may be expensive for certain cases, we also study the conventional communication model and the deposit communication model. As each of the proposed algorithms and the communication models has its special use for certain cases, we thus identify rules of thumb to decide the most suitable algorithm for dealing with general cases.

[1]  P. Sadayappan,et al.  Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines , 1996, J. Parallel Distributed Comput..

[2]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[3]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[4]  PeiZong Lee Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers , 1997, IEEE Trans. Parallel Distributed Syst..

[5]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[6]  Peter Brezany,et al.  Processing Array Statements and Procedure Interfaces in the PREPARE High Performance Fortran Compiler , 1994, CC.

[7]  J. Ramanujam,et al.  Communication Generation for Block-Cyclic Distributions , 1997, Parallel Process. Lett..

[8]  Fabien Coelho,et al.  State of the Art in Compiling HPF , 1996, The Data Parallel Programming Model.

[9]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[10]  Lei Wang,et al.  Runtime Performance of Parallel Array Assignment: An Empirical Study , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[11]  J. Ramanujam,et al.  Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors , 1996, J. Parallel Distributed Comput..

[12]  Ken Kennedy,et al.  Compilation techniques for block-cyclic distributions , 1994 .

[13]  Siegfried Benkner Handling block-cyclic distributed arrays in Vienna Fortran 90 , 1995, PACT.

[14]  Jean-Louis Pazat,et al.  An Array Partitioning Analysis for Parallel Loop Distribution , 1995, Euro-Par.

[15]  Henk J. Sips,et al.  An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[16]  Vincent Van Dongen Compiling Distributed Loops onto SPMD Code , 1994, Parallel Process. Lett..

[17]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[18]  Rajeev Thakur,et al.  Efficient Algorithms for Array Redistribution , 1996, IEEE Trans. Parallel Distributed Syst..

[19]  PeiZong Lee,et al.  Compiler techniques for determining data distribution and generating communication sets on distributed-memory machines , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[20]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[21]  Lionel M. Ni,et al.  Processor Mapping Techniques Toward Efficient Data Redistribution , 1995, IEEE Trans. Parallel Distributed Syst..

[22]  P. Sadayappan,et al.  An approach to communication-efficient data redistribution , 1994, ICS '94.

[23]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[24]  PeiZong Lee,et al.  Techniques for Compiling Programs on Distributed Memory Multicomputers , 1995, Parallel Comput..

[25]  Geoffrey C. Fox,et al.  Runtime array redistribution in HPF programs , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[26]  Jean-Luc Gaudiot,et al.  Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice , 1999, IEEE Trans. Parallel Distributed Syst..

[27]  Michael Wolfe,et al.  Optimization of Array Redistribution for Distributed Memory Multicomputers , 1995, Parallel Comput..

[28]  Geoffrey C. Fox,et al.  Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers , 1994, J. Parallel Distributed Comput..

[29]  Vikram S. Adve,et al.  Using integer sets for data-parallel program analysis and optimization , 1998, PLDI.

[30]  Prithviraj Banerjee,et al.  Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers , 1996, J. Parallel Distributed Comput..

[31]  J. Ramanujam,et al.  Communication Generation and Optimization for HPF , 1996 .

[32]  J. Ramanujam,et al.  Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs , 1996, LCPC.

[33]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[34]  R. Sarnath,et al.  Proceedings of the International Conference on Parallel Processing , 1992 .

[35]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[36]  Samuel P. Midkiff Local Iteration Set Computation for Block-Cyclic Distributions , 1995, ICPP.

[37]  Ching-Hsien Hsu,et al.  A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution , 1998, IEEE Trans. Parallel Distributed Syst..

[38]  Charles Koelbel Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[39]  Corinne Ancourt,et al.  A Linear Algebra Framework for Static High Performance Fortran Code Distribution , 1997, Sci. Program..

[40]  John R. Gilbert,et al.  Generating Local Address and Communication Sets for Data-Parallel Programs , 1995, J. Parallel Distributed Comput..

[41]  Ken Kennedy,et al.  Communication Generation for Cyclic(K) Distributions , 1996 .

[42]  Yves Robert,et al.  Scheduling Block-Cyclic Array Redistribution , 1998, IEEE Trans. Parallel Distributed Syst..