论文信息 - Do&Merge: Integrating Parallel Loops and Reductions

Do&Merge: Integrating Parallel Loops and Reductions

Many computations perform operations that match this pattern: first, a loop iterates over an input array, producing an array of (partial) results. The loop iterations are independent of each other and can be done in parallel. Second, a reduction operation combines the elements of the partial result array to produce the single final result. We call these two steps a Do&Merge computation. The most common way to effectively parallelize such a computation is for the programmer to apply a DOALL operation across the input array, and then to apply a reduction operator to the partial results. We show that combining the Do phase and the Merge phase into a single Do&Merge computation can lead to improved execution time and memory usage. In this paper we describe a simple and efficient construct (called the Pdo loop) that is included in an experimental HPF-like compiler for private-memory parallel systems.

Thomas R. Gross | David R. O'Hallaron | Jon A. Webb | Bwolen Yang | James M. Stichnoth

[1] Rice UniversityCORPORATE,et al. High performance Fortran language specification , 1993 .

[2] Michael Wolfe. Doany: Not Just Another Parallel Loop , 1992, LCPC.

[3] Ron Cytron. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[4] John R. Gilbert,et al. Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[5] James M. Stichnoth. Efficient Compilation of Array Statements for Private Memory Multicomputers , 1993 .

[6] Jon A. Webb. Steps toward architecture-independent image processing , 1992, Computer.

[7] Thomas R. Gross,et al. Exploiting task and data parallelism on a multicomputer , 1993, PPOPP '93.