A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster.

[1]  Kam-Hoi Cheng,et al.  A Heuristic of Scheduling Parallel Tasks and its Analysis , 1992, SIAM J. Comput..

[2]  G. N. Srinivasa Prasanna,et al.  Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory , 1994, IEEE Trans. Parallel Distributed Syst..

[3]  Sachin S. Sapatnekar,et al.  A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[4]  Jose E. Moreira,et al.  On the implementation and effectiveness of autoscheduling for shared-memory multiprocessors , 1996 .

[5]  K. Mani Chandy,et al.  Fortran M: A Language for Modular Parallel Programming , 1995, J. Parallel Distributed Comput..

[6]  Ronald L. Graham,et al.  Performance Guarantees for Scheduling Algorithms , 1978, Oper. Res..

[7]  Ronald E. Prather,et al.  Elements of discrete mathematics , 1986 .

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Pravin M. Vaidya,et al.  A new algorithm for minimizing convex functions over convex sets , 1996, Math. Program..

[10]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[11]  Tao Yang,et al.  A fast static scheduling algorithm for DAGs on an unbounded number of processors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  GuptaManish,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995 .

[13]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[14]  Geoffrey C. Fox,et al.  Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers , 1994, J. Parallel Distributed Comput..

[15]  Prithviraj Banerjee,et al.  Simultaneous exploitation of task and data parallelism in regular scientific applications , 1996 .

[16]  Monica S. Lam,et al.  An Overview of a Compiler for Scalable Parallel Machines , 1993, LCPC.

[17]  Manish Gupta,et al.  Compile-time estimation of communication costs on multicomputers , 1992, Proceedings Sixth International Parallel Processing Symposium.

[18]  Tao Yang,et al.  A parallel programming tool for scheduling on distributed memory multiprocessors , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[19]  Prithviraj Banerjee,et al.  A scheduling algorithm for parallelizable dependent tasks , 1991, [1991] Proceedings. The Fifth International Parallel Processing Symposium.

[20]  Peter A. Dinda,et al.  Communication and memory requirements as the basis for mapping task and data parallel programs , 1994, Proceedings of Supercomputing '94.

[21]  Ian Foster,et al.  A compilation system that integrates High Performance Fortran and Fortran M , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[22]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[23]  Eugene Wilson Hodges,et al.  High Performance Fortran Support for the PARADIGM Compiler , 1995 .

[24]  K. Timson,et al.  Center for research on parallel computation , 1992 .

[25]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[26]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[27]  K. Mani Chandy,et al.  Integrating Task and Data Parallelism in UC , 1995, ICPP.

[28]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[29]  C. L Liu,et al.  Elements of discrete mathematics (McGraw-Hill computer science series) , 1977 .

[30]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[31]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[32]  Barbara M. Chapman,et al.  Programming in Vienna Fortran , 1992, Sci. Program..

[33]  Thomas R. Gross,et al.  Exploiting task and data parallelism on a multicomputer , 1993, PPOPP '93.

[34]  Thomas R. Gross,et al.  Task Parallelism in a High Performance Fortran Framework , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[35]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[36]  Jaspal Subhlok,et al.  Optimal mapping of sequences of data parallel tasks , 1995, PPOPP '95.

[37]  G. N. Srinivasa Prasanna,et al.  Compile-time Techniques for Processor Allocation in Macro Dataflow Graphs for Multiprocessors , 1992, ICPP.

[38]  Antonio Lain Compiler and run-time support for irregular computations , 1996 .

[39]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[40]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[41]  Peter A. Dinda,et al.  The CMU task parallel program suite , 1994 .

[42]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[43]  Prithviraj Banerjee,et al.  An Approximate Algorithm for the Partitionable Independent Task Scheduling Problem , 1990, ICPP.

[44]  Milind Girkar Functional parallelism: theoretical foundations and implementation , 1992 .

[45]  John B. McLaughlin,et al.  Large‐scale computer simulation of fully developed turbulent channel flow with heat transfer , 1991 .

[46]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[47]  J. Ecker Geometric Programming: Methods, Computations and Applications , 1980 .

[48]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.