Many computations perform operations that match this pattern: first, a loop iterates over an input array, producing an array of (partial) results. The loop iterations are independent of each other and can be done in parallel. Second, a reduction operation combines the elements of the partial result array to produce the single final result. We call these two steps a Do&Merge computation. The most common way to effectively parallelize such a computation is for the programmer to apply a DOALL operation across the input array, and then to apply a reduction operator to the partial results. We show that combining the Do phase and the Merge phase into a single Do&Merge computation can lead to improved execution time and memory usage. In this paper we describe a simple and efficient construct (called the Pdo loop) that is included in an experimental HPF-like compiler for private-memory parallel systems.
[1]
Rice UniversityCORPORATE,et al.
High performance Fortran language specification
,
1993
.
[2]
Michael Wolfe.
Doany: Not Just Another Parallel Loop
,
1992,
LCPC.
[3]
Ron Cytron.
Doacross: Beyond Vectorization for Multiprocessors
,
1986,
ICPP.
[4]
John R. Gilbert,et al.
Generating local addresses and communication sets for data-parallel programs
,
1993,
PPOPP '93.
[5]
James M. Stichnoth.
Efficient Compilation of Array Statements for Private Memory Multicomputers
,
1993
.
[6]
Jon A. Webb.
Steps toward architecture-independent image processing
,
1992,
Computer.
[7]
Thomas R. Gross,et al.
Exploiting task and data parallelism on a multicomputer
,
1993,
PPOPP '93.