Since APL, reductions and scans have been recognized as powerful programming concepts. Abstracting an accumulation loop (reduction) and an update loop (scan), the concepts have efficient parallel implementations based on the parallel prefix algorithm. They are often included in high-level languages with a built-in set of operators such as sum, product, min, etc. MPI provides library routines for reductions that account for nearly nine percent of all MPI calls in the NAS Parallel Benchmarks (NPB) version 3.2. Some researchers have even advocated reductions and scans as the principal tool for parallel algorithm design.Also since APL, the idea of applying the reduction control structure to a user-defined operator has been proposed, and several implementations (some parallel) have been reported. This paper presents the first global-view formulation of user-defined scans and an improved global-view formulation of user-defined reductions, demonstrating them in the context of the Chapel programming language. Further, these formulations are extended to a message passing context (MPI), thus transferring global-view abstractions to local-view languages and perhaps signaling a way to enhance local-view languages incrementally. Finally, examples are presented showing global-view user-defined reductions "cleaning up" and/or "speeding up" portions of two NAS benchmarks, IS and MG. In consequence, these generalized reduction and scan abstractions make the full power of the parallel prefix technique available to both global- and local-view parallel programming.
[1]
Katherine A. Yelick,et al.
Titanium: A High-performance Java Dialect
,
1998,
Concurr. Pract. Exp..
[2]
Katherine Yelick,et al.
Introduction to UPC and Language Specification
,
2000
.
[3]
Guy E. Blelloch,et al.
NESL: A Nested Data-Parallel Language (Version 2.6)
,
1993
.
[4]
John M. Mellor-Crummey,et al.
Co-array Fortran Performance and Potential: An NPB Experimental Study
,
2003,
LCPC.
[5]
Steven J. Deitz,et al.
High-level Language Support for User-defined Reductions
,
2004,
The Journal of Supercomputing.
[6]
Vaidy S. Sunderam,et al.
PVM: A Framework for Parallel Distributed Computing
,
1990,
Concurr. Pract. Exp..
[7]
Allan L. Fisher,et al.
Parallelizing complex scans and reductions
,
1994,
PLDI '94.
[8]
Bryan Carpenter,et al.
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems
,
1999,
IPPS/SPDP Workshops.
[9]
Steven J. Deitz,et al.
High-level programming language abstractions for advanced and dynamic parallel computations
,
2005
.
[10]
Robert W. Numrich,et al.
Co-array Fortran for parallel programming
,
1998,
FORF.
[11]
Guy E. Blelloch,et al.
Vector Models for Data-Parallel Computing
,
1990
.
[12]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[13]
Sanjeev Saxena,et al.
On Parallel Prefix Computation
,
1994,
Parallel Process. Lett..