A Generalized Processor Mapping Technique for Array Redistribution

In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access in many parallel programs on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a generalized processor mapping technique to minimize the amount of data exchange for BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) array redistribution and vice versa. The main idea of the generalized processor mapping technique is first to develop mapping functions for computing a new rank of each destination processor. Based on the mapping functions, a new logical sequence of destination processors can be derived. The new logical processor sequence is then used to minimize the amount of data exchange in a redistribution. The generalized processor mapping technique can handle array redistribution with arbitrary source and destination processor sets and can be applied to multidimensional array redistribution. We present a theoretical model to analyze the performance improvement of the generalized processor mapping technique. To evaluate the performance of the proposed technique, we have implemented the generalized processor mapping technique on an IBM SP2 parallel machine. The experimental results show that the generalized processor mapping technique can provide performance improvement over a wide range of redistribution problems.

[1]  Peter Brezany,et al.  Vienna Fortran - A Language Specification. Version 1.1 , 1992 .

[2]  Barbara M. Chapman,et al.  Dynamic data distributions in Vienna Fortran , 1993, Supercomputing '93. Proceedings.

[3]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[4]  J. Ramanujam,et al.  HPF Array Statements: Communication Generation and Optimization , 1995 .

[5]  PeiZong Lee,et al.  Compiler techniques for determining data distribution and generating communication sets on distributed-memory machines , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[6]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[7]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[8]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[9]  Viktor K. Prasanna,et al.  Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[10]  Michael Wolfe,et al.  Optimization of Array Redistribution for Distributed Memory Multicomputers , 1995, Parallel Comput..

[11]  Rajeev Thakur,et al.  Efficient Algorithms for Array Redistribution , 1996, IEEE Trans. Parallel Distributed Syst..

[12]  Yves Robert,et al.  Scheduling Block-Cyclic Array Redistribution , 1998, IEEE Trans. Parallel Distributed Syst..

[13]  Lionel M. Ni,et al.  Processor Mapping Techniques Toward Efficient Data Redistribution , 1995, IEEE Trans. Parallel Distributed Syst..

[14]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[15]  P. Sadayappan,et al.  Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines , 1996, J. Parallel Distributed Comput..

[16]  Piyush Mehrotra,et al.  Dynamic data distributions in Vienna Fortran , 1993, Supercomputing '93.

[17]  J. Ramanujam,et al.  Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[18]  Prithviraj Banerjee,et al.  Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers , 1996, J. Parallel Distributed Comput..

[19]  Yves Robert,et al.  Block-Cyclic Array Redistribution on Networks of Workstations , 1997, PVM/MPI.

[20]  Geoffrey C. Fox,et al.  Runtime array redistribution in HPF programs , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21]  Viktor K. Prasanna,et al.  Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.

[22]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[23]  Siegfried Benkner Handling block-cyclic distributed arrays in Vienna Fortran 90 , 1995, PACT.

[24]  Ken Kennedy,et al.  Compilation techniques for block-cyclic distributions , 1994 .

[25]  Michael Wolfe,et al.  A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.

[26]  P. Sadayappan,et al.  An approach to communication-efficient data redistribution , 1994, ICS '94.

[27]  David W. Walker,et al.  Redistribution of block‐cyclic data distributions using MPI , 1996 .

[28]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[29]  David W. Walker,et al.  Redistribution of block-cyclic data distributions using MPI , 1996, Concurr. Pract. Exp..

[30]  Bernard Tourancheau,et al.  Fast Runtime Block Cyclic Data Redistribution on Multiprocessors , 1997, J. Parallel Distributed Comput..

[31]  Ching-Hsien Hsu,et al.  A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution , 1998, IEEE Trans. Parallel Distributed Syst..

[32]  Ching-Hsien Hsu,et al.  Efficient Methods for kr R r and r R kr Array Redistribution 1 , 1998 .

[33]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[34]  John R. Gilbert,et al.  Generating Local Address and Communication Sets for Data-Parallel Programs , 1995, J. Parallel Distributed Comput..

[35]  Lionel M. Ni,et al.  DaReL: a portable data redistribution library for distributed-memory machines , 1994, Proceedings Scalable Parallel Libraries Conference.