Parallelizing nested loops on multicomputers-the grouping approach

The design of a tool for partitioning and parallelizing nested loops for execution on distributed-memory multicomputers is presented. The core of the tool is a technique called grouping, which identifies appropriate loop partition patterns based on data dependencies across the iterations. The grouping technique combined with analytic results from performance modeling tools will allow certain nested loops to be partitioned systematically and automatically, without users specifying the data partitions. Grouping is based on the concept of pipelined data parallel computation , which promises to achieve a balanced computation and communication on multicomputers. The basic structure of the parallelizing tool is presented. The grouping and performance analysis techniques for pipelined data parallel computations are described. A prototype of the tool is introduced to illustrate the feasibility of the approach.<<ETX>>

[1]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[2]  Chung-Ta King,et al.  Grouping in Nested Loops for Parallel Execution on Multicomputers , 1989, International Conference on Parallel Processing.

[3]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[4]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[5]  Jenq-Neng Hwang,et al.  Wavefront Array Processors-Concept to Implementation , 1987, Computer.

[6]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[7]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[8]  M. Girkar,et al.  Compiling issues for supercomputers , 1988, Proceedings. SUPERCOMPUTING '88.

[9]  Dan I. Moldovan,et al.  On the Analysis and Synthesis of VLSI Algorithms , 1982, IEEE Transactions on Computers.

[10]  Chung-Ta King,et al.  Pipelined Data Parallel Algorithms-I: Concept and Modeling , 1990, IEEE Trans. Parallel Distributed Syst..

[11]  Wesley W. Chu,et al.  Task Allocation in Distributed Data Processing , 1980, Computer.

[12]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[13]  Anita Osterhaug Guide to parallel programming on Sequent computer systems , 1989 .

[14]  Charles L. Seitz,et al.  Multicomputers: message-passing concurrent computers , 1988, Computer.

[15]  D. A. Reed,et al.  Networks for parallel processors: measurements and prognostications , 1988, C3P.

[16]  Philip J. Hatcher,et al.  Compiling C* programs for a hypercube multicomputer , 1988, PPoPP 1988.

[17]  Dennis Gannon,et al.  On the problem of optimizing data transfers for complex memory systems , 1988, ICS '88.

[18]  Peiyi Tang,et al.  Dynamic Processor Self-Scheduling for General Parallel Nested Loops , 1987, IEEE Trans. Computers.

[19]  Masahiro Tsuchiya,et al.  A Task Allocation Model for Distributed Computing Systems , 1982, IEEE Transactions on Computers.

[20]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[21]  Philip J. Hatcher,et al.  Compiling C* Programs for a Hypercube Multicomputer , 1988, PPOPP/PPEALS.

[22]  David A. Padua,et al.  Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[23]  Benjamin W. Wah,et al.  The Design of Optimal Systolic Arrays , 1985, IEEE Transactions on Computers.

[24]  Ken Kennedy,et al.  Automatic loop interchange , 2004, SIGP.

[25]  Kai Hwang Advanced parallel processing with supercomputer architectures , 1987, Proceedings of the IEEE.