Packing/unpacking information generation for efficient generalized kr/spl rarr/r and r/spl rarr/kr array redistribution

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods to generate the packing/unpacking information for BOLCK-CYCLIC(kr) to BLOCK-CYCLIC(r) and BOLCK-CYCLIC(r) to BLOCK-CYCLIC(kr) redistribution with arbitrary source/destination processor sets. The most significant improvement of this paper is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information derived from kr/spl rarr/r and r/spl rarr/kr redistributions, a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with the PITFALLS method and the Prylli's method on an IBM SP2 parallel machine. The experimental results show that our algorithms outperform the PITFALLS method and the Prylli's method for all test samples.

[1]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[2]  Michael Wolfe,et al.  Optimization of Array Redistribution for Distributed Memory Multicomputers , 1995, Parallel Comput..

[3]  Ching-Hsien Hsu,et al.  A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution , 1998, IEEE Trans. Parallel Distributed Syst..

[4]  Michael Wolfe,et al.  A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.

[5]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[6]  P. Sadayappan,et al.  Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines , 1996, J. Parallel Distributed Comput..

[7]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[8]  John R. Gilbert,et al.  Generating Local Address and Communication Sets for Data-Parallel Programs , 1995, J. Parallel Distributed Comput..

[9]  Bernard Tourancheau,et al.  Fast Runtime Block Cyclic Data Redistribution on Multiprocessors , 1997, J. Parallel Distributed Comput..

[10]  Lionel M. Ni,et al.  Processor Mapping Techniques Toward Efficient Data Redistribution , 1995, IEEE Trans. Parallel Distributed Syst..

[11]  J. Ramanujam,et al.  Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[12]  David W. Walker,et al.  Redistribution of block-cyclic data distributions using MPI , 1996, Concurr. Pract. Exp..

[13]  Yves Robert,et al.  Scheduling Block-Cyclic Array Redistribution , 1998, IEEE Trans. Parallel Distributed Syst..

[14]  Prithviraj Banerjee,et al.  Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers , 1996, J. Parallel Distributed Comput..

[15]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[16]  Viktor K. Prasanna,et al.  Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.

[17]  Viktor K. Prasanna,et al.  Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[18]  Rajeev Thakur,et al.  Efficient Algorithms for Array Redistribution , 1996, IEEE Trans. Parallel Distributed Syst..