Efficient methods for kr/spl rarr/r and r/spl rarr/kr array redistribution

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. We present efficient algorithms for array redistribution. The most significant improvement of our algorithms is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information that derived from the BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) redistribution (or vice versa), a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with Thakur's (1994) methods on an IBM SP2 parallel machine. The results show that the execution time of our algorithms is approximately 5% to 27% faster than that of Thakur's methods.

[1]  Geoffrey C. Fox,et al.  Runtime array redistribution in HPF programs , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[2]  Prithviraj Banerjee,et al.  Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers , 1996, J. Parallel Distributed Comput..

[3]  Rajeev Thakur,et al.  Efficient Algorithms for Array Redistribution , 1996, IEEE Trans. Parallel Distributed Syst..

[4]  Michael Wolfe,et al.  A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.

[5]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[6]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[7]  John R. Gilbert,et al.  Generating Local Address and Communication Sets for Data-Parallel Programs , 1995, J. Parallel Distributed Comput..

[8]  David W. Walker,et al.  Redistribution of block‐cyclic data distributions using MPI , 1996 .

[9]  J. Ramanujam,et al.  Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[10]  Michael Wolfe,et al.  Optimization of Array Redistribution for Distributed Memory Multicomputers , 1995, Parallel Comput..

[11]  Lionel M. Ni,et al.  Processor Mapping Techniques Toward Efficient Data Redistribution , 1995, IEEE Trans. Parallel Distributed Syst..

[12]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[13]  Ken Kennedy,et al.  Compilation techniques for block-cyclic distributions , 1994 .

[14]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.