Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

Block-oriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency and reduces the overall communication volume so that it performs more efficiently on a distributed-memory multiprocessor system than the customary column-oriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that represents the execution behavior of block sparse Cholesky factorization in a distributed-memory system. Since the characteristics of tasks for block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consists of two stages: early-start clustering, and affined cluster mapping (ACM). The early-start clustering stage is used to cluster tasks while preserving the earliest start time of a task without limiting parallelism. After task clustering, the ACM stage allocates clusters to processors considering both communication cost and load balance. Experimental results on

[1]  D. Rose,et al.  Generalized nested dissection , 1977 .

[2]  Barry W. Peyton,et al.  Block sparse Cholesky algorithms on advanced uniprocessor computers , 1991 .

[3]  J. Ortega Introduction to Parallel and Vector Solution of Linear Systems , 1988, Frontiers of Computer Science.

[4]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[5]  John R. Gilbert,et al.  Highly Parallel Sparse Cholesky Factorization , 1992, SIAM J. Sci. Comput..

[6]  A. George,et al.  Parallel Cholesky factorization on a shared-memory multiprocessor. Final report, 1 October 1986-30 September 1987 , 1986 .

[7]  Tao Yang,et al.  Elimination forest guided 2D sparse LU factorization , 1998, SPAA '98.

[8]  Joseph W. H. Liu,et al.  Modification of the minimum-degree algorithm by multiple elimination , 1985, TOMS.

[9]  Laurie A. Hulbert,et al.  Limiting Communication in Parallel Sparse Cholesky Factorization , 1991, SIAM J. Sci. Comput..

[10]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[11]  I. Duff,et al.  White House Conference on Library and Information Services: Final Passage (1988): Correspondence 10 , 1987 .

[12]  Jack J. Dongarra,et al.  A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.

[13]  Tao Yang,et al.  Space/time-efficient scheduling and execution of parallel irregular computations , 1998, TOPL.

[14]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[15]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[16]  B. J. Lageweg,et al.  Multiprocessor scheduling with communication delays , 1990, Parallel Comput..

[17]  Cleve Ashcraft,et al.  SPOOLES: An Object-Oriented Sparse Matrix Library , 1999, PPSC.

[18]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..

[19]  Jing-Chiou Liou,et al.  Task Clustering and Scheduling for Distributed Memory Parallel Architectures , 1996, IEEE Trans. Parallel Distributed Syst..

[20]  Barry W. Peyton,et al.  A Supernodal Cholesky Factorization Algorithm for Shared-Memory Multiprocessors , 1991, SIAM J. Sci. Comput..

[21]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[22]  Pascal Hénon,et al.  A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization , 1999, Euro-Par.

[23]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[24]  A. Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93.

[25]  Pascal Hénon,et al.  PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions , 2000, IPDPS Workshops.

[26]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[27]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..

[28]  Robert Schreiber,et al.  Scalability of Sparse Direct Solvers , 1993 .

[29]  Michael T. Heath,et al.  Sparse Cholesky factorization on a local-memory multiprocessor , 1988 .

[30]  E. Rothberg,et al.  Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[31]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[32]  Barry W. Peyton,et al.  Progress in Sparse Matrix Methods for Large Linear Systems On Vector Supercomputers , 1987 .

[33]  Joseph W. H. Liu,et al.  A Comparison of Three Column-Based Distributed Sparse Factorization Schemes. , 1990 .

[34]  Iain S. Duff,et al.  Sparse matrix test problems , 1982 .

[35]  Robert Schreiber,et al.  Improved load distribution in parallel sparse Cholesky factorization , 1994, Proceedings of Supercomputing '94.

[36]  Anoop Gupta,et al.  The performance impact of data reuse in parallel dense Cholesky factorization , 1992 .

[37]  Roger Grimes,et al.  The influence of relaxed supernode partitions on the multifrontal method , 1989, TOMS.

[38]  Tzong-Jer Yang,et al.  A comparison of clustering heuristics for scheduling dags on multiprocessors , 1990 .

[39]  Tao Yang,et al.  Run-Time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures , 1997, J. Parallel Distributed Comput..