Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines

Many partitioned scientific programs can be modeled as iterative executions of computational tasks and represented by iterative task graphs (ITGs). An ITG may or may not have dependence cycles. In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead and propose heuristic algorithms for scheduling both cyclic and acyclic ITGs without searching an entire iteration space. Our approach incorporates techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling, and load balancing. We analyze the asymptotic optimality of the algorithms to show that the derived schedules are competitive to optimal solutions. We also study the sensitivity of scheduling performance on inaccurate weights. Finally, we present experimental results to demonstrate the effectiveness of the optimization techniques.

[1]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[2]  Tao Yang,et al.  Efficient Parallelization of Relaxation Iterative Methods for Solving Banded Linear Systems on Multiprocessors , 1995, SIAM Conference on Parallel Processing for Scientific Computing.

[3]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[4]  Vivek Sarkar,et al.  Mapping Iterative Task Graphs on Distributed Memory Machines , 1995, ICPP.

[5]  Alexandru Nicolau,et al.  Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[6]  D.R. O'Hallaron,et al.  The Assign Parallel Program Generator , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[7]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[8]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[9]  Constantine D. Polychronopoulos,et al.  Parallel programming and compilers , 1988 .

[10]  Richard Wolski,et al.  Program Partitioning for NUMA Multiprocessor Computer Systems , 1993, J. Parallel Distributed Comput..

[11]  Jing-Chiou Liou,et al.  A greedy task clustering heuristic that is provably good , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).

[12]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[13]  Jeanne Ferrante,et al.  Determing Asynchronous Pipeline Execution Times , 1996, LCPC.

[14]  Carolyn McCreary,et al.  Automatic determination of grain size for efficient parallel processing , 1989, CSC '89.

[15]  Shahid H. Bokhari,et al.  Assignment Problems in Parallel and Distributed Computing , 1987 .

[16]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[17]  Raymond Reiter,et al.  Scheduling Parallel Computations , 1968, J. ACM.

[18]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[19]  Tao Yang,et al.  Integrating Software Pipelining and Graph Scheduling for Iterative Scientific Computations , 1995, IRREGULAR.

[20]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[21]  Jeanne Ferrante,et al.  Determining asynchronous acyclic pipeline execution times , 1996, Proceedings of International Conference on Parallel Processing.

[22]  Jack J. Dongarra,et al.  Graphical development tools for network-based concurrent supercomputing , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[23]  Vivek Sarkar,et al.  Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[24]  Hesham El-Rewini,et al.  Parallax: a tool for parallel program scheduling , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[25]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[26]  Peter J. Denning,et al.  Operating Systems Theory , 1973 .

[27]  Thomas H. Dunigan,et al.  Performance of the Intel iPSC/860 and Ncube 6400 hypercubes , 1991, Parallel Comput..

[28]  Iain S. Duff,et al.  Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .

[29]  Keshab K. Parhi,et al.  Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding , 1991, IEEE Trans. Computers.

[30]  Michel Cosnard,et al.  Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[31]  Dharma P. Agrawal,et al.  A Scalable Scheduling Scheme for Functional Parallelism on Distributed Memory Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[32]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[33]  H. F. Jordan,et al.  Is SOR Color-Blind? , 1986 .

[34]  Pei Wang,et al.  An efficient numerical tank for non-linear water waves, based on the multi-subdomain approach with BEM , 1995 .

[35]  Larry Carter,et al.  Efficient Parallelism via Hierarchical Tiling , 1995, PPSC.

[36]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[37]  Uwe Schwiegelshohn,et al.  Scheduling Loops on Parallel Processors: A Simple Algorithm with Close to Optimum Performance , 1992, CONPAR.

[38]  James C. Browne,et al.  The CODE 2.0 graphical parallel programming language , 1992, ICS '92.

[39]  Harry Berryman,et al.  Run-Time Scheduling and Execution of Loops on Message Passing Machines , 1990, J. Parallel Distributed Comput..

[40]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[41]  Tao Yang,et al.  Run-time compilation for parallel sparse matrix computations , 1996, ICS '96.

[42]  S. Ranka,et al.  Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors , 1992, Proceedings Supercomputing '92.

[43]  Xiangyun Kong,et al.  The Direction Vector I Test , 1993, IEEE Trans. Parallel Distributed Syst..

[44]  Guang R. Gao,et al.  A Polynomial Time Method for Optimal Software Pipelining , 1992, CONPAR.