Detection and global optimization of reduction operations for distributed parallel machines

This paper presents a new technique for detecting and optimizing reduction operations for parallelizhtg compilers. The technique presented here can detect reduction constructs in general complex loops, parallelize the loops containing reduction constructs, and optimize communications for multiple reduction operations. The optimization proposed here can be applied not only to individual reduction loops, but also to multiple loop nests throughout a program. The techniques have been implemented in our HPF compiler, and their effectiveness is evaluated on an IBM Scalable PowerParallel System SP2 using a set of standard benchmarking programs. Although the aperimental results are still preliminary, it is shown that our techniques for detecting and optimizing reductions are eflective on practical application programs.

[1]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[2]  Williams Ludwell Harrison,et al.  Automatic recognition of induction variables and recurrence relations by abstract interpretation , 1990, PLDI '90.

[3]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[4]  D. Callahan,et al.  Recognizing and Parallelizing Bounded Recurrences , 1991, LCPC.

[5]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[6]  Hans P. Zima,et al.  Compiling for distributed-memory systems , 1993 .

[7]  Ron Y. Pinter,et al.  Program optimization and parallelization using idioms , 1991, POPL '91.

[8]  Kazuaki Ishizaki,et al.  A Loop Parallelization Algorithm for HPF Compilers , 1995, LCPC.

[9]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[10]  Rudolf Eigenmann,et al.  Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.

[11]  S LamMonica,et al.  Communication optimization and code generation for distributed memory machines , 1993 .

[12]  Paul Feautrier,et al.  Detection of Recurrences in Sequential Programs with Loops , 1993, PARLE.

[13]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[14]  Pierre Jouvelot,et al.  A unified semantic approach for the vectorization and parallelization of generalized reductions , 1989, ICS '89.

[15]  Thomas R. Gross,et al.  Do&Merge: Integrating Parallel Loops and Reductions , 1993, LCPC.

[16]  Allan L. Fisher,et al.  Flattening and parallelizing irregular, recurrent loop nests , 1995, PPOPP '95.

[17]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[18]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[19]  Peiyi Tang,et al.  Vectorization beyond data dependences , 1995, ICS '95.

[20]  Paul Feautrier,et al.  Scheduling reductions , 1994, ICS '94.