A Loop Transformation Algorithm for Communication Overlapping

Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.

[1]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[2]  Ken Kennedy,et al.  Compiling programs for distributed-memory multiprocessors , 2004, The Journal of Supercomputing.

[3]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[4]  Marc Snir,et al.  The Communication Software and Parallel Environment of the IBM SP2 , 1995, IBM Syst. J..

[5]  Kazuaki Ishizaki,et al.  A Loop Parallelization Algorithm for HPF Compilers , 1995, LCPC.

[6]  Charles Koelbel,et al.  Supporting shared data structures on distributed memory architectures , 1990, PPOPP '90.

[7]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[8]  K. Kennedy,et al.  Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.

[9]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[10]  Geoffrey C. Fox,et al.  Fortran 90D/HPF compiler for distributed memory MIMD computers: design, implementation, and performance results , 1993, Supercomputing '93.

[11]  Hidetoshi Iwashita,et al.  HPF compiler for the AP1000 , 1995, ICS '95.

[12]  Michael Philippsen,et al.  Automatic alignment of array data and processes to reduce communication time on DMPPs , 1995, PPOPP '95.

[13]  Prithviraj Banerjee,et al.  Techniques to overlap computation and communication in irregular iterative applications , 1994, ICS '94.

[14]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[15]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[16]  Toshio Nakatani,et al.  Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.

[17]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[18]  Tilak Agerwala,et al.  SP2 System Architecture , 1999, IBM Syst. J..

[19]  Kenichi Hayashi,et al.  Improving AP1000 parallel computer performance with message communication , 1993, ISCA '93.

[20]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[21]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[22]  John A. Chandy,et al.  Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[23]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[24]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[25]  Hiroshi Ohta,et al.  Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.

[26]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[27]  Monica S. Lam,et al.  The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler , 1994 .

[28]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..