A Method for Runtime Recognition of Collective Communication on Distributed-Memory Multiprocessors

In this paper, we present a compiler optimization for recognizing patterns of collective communication at runtime in data-parallel languages that allow the dynamic data decomposition. It has a calculation time of the order O(m), and is appropriate for large numerical applications and massively parallel machines. The previous approach took O(nO + ... + nm−1) time, where m is the number of dimension of an array and ni is the array size on the i-th dimension. The new method can be used for data redistribution and intrinsic procedures, as well as data pre-fetch in parallelized loops.

[1]  Toshio Nakatani,et al.  Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.

[2]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[3]  Manish Gupta,et al.  A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.

[4]  J. Ramanujam,et al.  Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[5]  Alok Choudhary,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93.

[6]  Samuel P. Midkiff Local Iteration Set Computation for Block-Cyclic Distributions , 1995, ICPP.

[7]  Kazuaki Ishizaki,et al.  A Loop Parallelization Algorithm for HPF Compilers , 1995, LCPC.

[8]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[9]  Geoffrey C. Fox,et al.  Runtime array redistribution in HPF programs , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[10]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[11]  Sanjay Ranka,et al.  Irregular Personalized Communication on Distributed Memory Machines , 1995, J. Parallel Distributed Comput..

[12]  Charles Koelbel,et al.  High Performance Fortran Handbook , 1993 .

[13]  Joel H. Saltz,et al.  An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications , 1995, IEEE Trans. Parallel Distributed Syst..

[14]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[15]  Lionel M. Ni,et al.  DaReL: a portable data redistribution library for distributed-memory machines , 1994, Proceedings Scalable Parallel Libraries Conference.

[16]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).