Integrating Software Pipelining and Graph Scheduling for Iterative Scientific Computations

Graph scheduling has been shown effective for solving irregular problems represented as directed acyclic graphs(DAGs) on distributed memory systems. Many scientific applications can also be modeled as iterative task graphs(ITGs). In this paper, we model the SOR computation for solving sparse matrix systems in terms of ITGs and address the optimization issues for scheduling ITGs when communication overhead is not zero. We present an approach that incorporates techniques of software pipelining and graph scheduling. We demonstrate the effectiveness of our approach in mapping SOR computation and compare it with the multi-coloring method.

[1]  Raymond Reiter,et al.  Scheduling Parallel Computations , 1968, J. ACM.

[2]  A. George,et al.  Parallel Cholesky factorization on a shared-memory multiprocessor. Final report, 1 October 1986-30 September 1987 , 1986 .

[3]  Keshab K. Parhi,et al.  Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding , 1991, IEEE Trans. Computers.

[4]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[5]  Jeanne Ferrante,et al.  Determing Asynchronous Pipeline Execution Times , 1996, LCPC.

[6]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[7]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[9]  Uwe Schwiegelshohn,et al.  Scheduling Loops on Parallel Processors: A Simple Algorithm with Close to Optimum Performance , 1992, CONPAR.

[10]  Tao Yang,et al.  Efficient Parallelization of Relaxation Iterative Methods for Solving Banded Linear Systems on Multiprocessors , 1995, SIAM Conference on Parallel Processing for Scientific Computing.

[11]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[12]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[13]  Tao Yang,et al.  Scheduling Of Structured and Unstructured computation , 1994, Interconnection Networks and Mapping and Scheduling Parallel Computations.

[14]  Guang R. Gao,et al.  A Polynomial Time Method for Optimal Software Pipelining , 1992, CONPAR.

[15]  Thomas H. Dunigan,et al.  Performance of the Intel iPSC/860 and Ncube 6400 hypercubes , 1991, Parallel Comput..

[16]  Iain S. Duff,et al.  Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .

[17]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[18]  Michel Cosnard,et al.  Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[19]  Robert,et al.  Parallel Sparse Triangular Solution with Partitioned Inverses andPrescheduled , 1995 .

[20]  H. F. Jordan,et al.  Is SOR Color-Blind? , 1986 .

[21]  Vivek Sarkar,et al.  Mapping Iterative Task Graphs on Distributed Memory Machines , 1995, ICPP.

[22]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[23]  Marvin V. Zelkowitz,et al.  Programming Languages: Design and Implementation , 1975 .

[24]  Weerakorn Ongsakul,et al.  An efficient task allocation algorithm and its use to parallelize irregular Gauss-Seidel type algorithms , 1994, Proceedings of 8th International Parallel Processing Symposium.

[25]  Vivek Sarkar,et al.  Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .