High-level Language Support for User-defined Reductions

The optimized handling of reductions on parallel supercomputers or clusters of workstations is critical to high performance because reductions are common in scientific codes and a potential source of bottlenecks. Yet in many high-level languages, a mechanism for writing efficient reductions remains surprisingly absent. Further, when such mechanisms do exist, they often do not provide the flexibility a programmer needs to achieve a desirable level of performance. In this paper, we present a new language construct for arbitrary reductions that lets a programmer achieve a level of performance equal to that achievable with the highly flexible, but low-level combination of Fortran and MPI. We have implemented this construct in the ZPL language and evaluate it in the context of the initialization of the NAS MG benchmark. We show a 45 times speedup over the same code written in ZPL without this construct. In addition, performance on a large number of processors surpasses that achieved in the NAS implementation showing that our mechanism provides programmers with the needed flexibility.

[1]  Toshio Nakatani,et al.  Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.

[2]  James R. Larus,et al.  Parallel Programming in C**: A Large-Grain Data-Parallel Programming Language , 1996 .

[3]  Bo Lu,et al.  Compiler optimization of implicit reductions for distributed memory multiprocessors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[4]  Bradford L. Chamberlain,et al.  ZPL's WYSIWYG performance model , 1998, Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[5]  Lawrence Snyder,et al.  A programmer's guide to ZPL , 1999 .

[6]  James R. Larus C**: A Large-Grain, Object-Oriented, Data-Parallel Programming Language , 1992, LCPC.

[7]  Bradford L. Chamberlain,et al.  Regions: an abstraction for expressing array computation , 1998, APL.

[8]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[9]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .

[10]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis framework for parallelizing compilers , 1996, PLDI '96.

[11]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[12]  Lawrence Rauchwerger,et al.  Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.

[13]  Steven J. Deitz,et al.  A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[14]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[15]  Oscar H. Ibarra,et al.  On the Complexity of Commutativity Analysis , 1996, Int. J. Found. Comput. Sci..

[16]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .

[17]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[18]  Sven-Bodo Scholz On defining application-specific high-level array operations by means of shape-invariant programming facilities , 1998, APL.