Communication and memory requirements as the basis for mapping task and data parallel programs

For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploiting task and data parallelism in a single compiler framework, and such a compiler can map a single source program in many different ways onto a parallel machine. The tradeoffs between task and data parallelism are complex and depend on the characteristics of the program to be executed, most significantly the memory and communication requirements, and the performance parameters of the target parallel machine. We present a framework to isolate and examine the specific characteristics of programs that determine the performance for different mappings. Our focus is on applications that process a stream of input, and whose computation structure is fairly static and predictable. We describe three such applications that were developed with our compiler: fast Fourier transforms, narrowband tracking radar; and multibaseline stereo. We examine the tradeoffs between various mappings for them and show how the framework is used to obtain efficient mappings.<<ETX>>

[1]  David M. Nicol,et al.  Optimal Processor Assignment for a Class of Pipelined Computations , 1994, IEEE Trans. Parallel Distributed Syst..

[2]  David M. Nicol,et al.  Improved Algorithms for Mapping Pipelined and Parallel Computations , 1991, IEEE Trans. Computers.

[3]  W. M. Gentleman,et al.  Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).

[4]  H. T. Kung,et al.  Automatic Mapping Of Large Signal Processing Systems To A Parallel Machine , 1991, Optics & Photonics.

[5]  Peter A. Dinda,et al.  The CMU task parallel program suite , 1994 .

[6]  K. Mani Chandy,et al.  Fortran M: A Language for Modular Parallel Programming , 1995, J. Parallel Distributed Comput..

[7]  Takeo Kanade,et al.  A multiple-baseline stereo , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Barbara M. Chapman,et al.  Programming in Vienna Fortran , 1992, Sci. Program..

[9]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[10]  V. Vatsa,et al.  An Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications , 1995 .

[11]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[12]  Ken Kennedy,et al.  Integrated Support for Task and Data Parallelism , 1994, Int. J. High Perform. Comput. Appl..

[13]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[14]  Thomas R. Gross,et al.  Task Parallelism in a High Performance Fortran Framework , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[15]  Thomas R. Gross,et al.  Exploiting task and data parallelism on a multicomputer , 1993, PPOPP '93.

[16]  Anthony P. Reeves,et al.  Function-Parallel Computation in a Data-Parallel Environment , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[17]  Joel H. Saltz,et al.  An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications , 1995, IEEE Trans. Parallel Distributed Syst..

[18]  Jaspal Subhlok Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers , 1993 .

[19]  Jon A. Webb Implementation and performance of fast parallel multi-baseline stereo vision , 1993, 1993 Computer Architectures for Machine Perception.

[20]  David H. Bailey,et al.  FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[21]  Barbara M. Chapman,et al.  A Software Architecture for Multidisciplinary Applications: Integrating Task and Data Parallelism , 1994, CONPAR.

[22]  J. A. Webb Latency and bandwidth considerations in parallel robotics image processing , 1993, Supercomputing '93.

[23]  Monica S. Lam,et al.  Coarse-grain parallel programming in Jade , 1991, PPOPP '91.

[24]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[25]  K. Kennedy,et al.  Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.

[26]  Prithviraj Banerjee,et al.  Processor Allocation and Scheduling of Macro Dataflow Graphs on Distributed Memory Multicomputers by the PARADIGM Compiler , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[27]  Shahid H. Bokhari,et al.  Assignment Problems in Parallel and Distributed Computing , 1987 .