Do&Merge: Integrating Parallel Loops and Reductions

Many computations perform operations that match this pattern: first, a loop iterates over an input array, producing an array of (partial) results. The loop iterations are independent of each other and can be done in parallel. Second, a reduction operation combines the elements of the partial result array to produce the single final result. We call these two steps a Do&Merge computation. The most common way to effectively parallelize such a computation is for the programmer to apply a DOALL operation across the input array, and then to apply a reduction operator to the partial results. We show that combining the Do phase and the Merge phase into a single Do&Merge computation can lead to improved execution time and memory usage. In this paper we describe a simple and efficient construct (called the Pdo loop) that is included in an experimental HPF-like compiler for private-memory parallel systems.