Dynamic Parallelization Strategies for Multifrontal Sparse Cholesky Factorization

This paper discusses parallelization of the computationally intensive numerical factorization phase of sparse Cholesky factorization on shared memory systems. We propose and compare two parallel algorithms based on the multifrontal method. Both algorithms are implemented in a task-based fashion employing dynamic load balance. The first algorithm associates OpenMP tasks with the nodes of an elimination tree and relies on the OpenMP scheduler. The second algorithm employs a concurrent priority queue to implement balancing. Experimental results on symmetric positive definite matrices from the University of Florida Sparse Matrix Collection show that our implementation is comparable to MUMPS and Intel MKL PARDISO in terms of performance and scaling efficiency on shared memory systems.

[1]  Barry W. Peyton,et al.  Block Sparse Cholesky Algorithms on Advanced Uniprocessor Computers , 1991, SIAM J. Sci. Comput..

[2]  Xiaoye S. Li,et al.  An overview of SuperLU: Algorithms, implementation, and user interface , 2003, TOMS.

[3]  Barry W. Peyton,et al.  Progress in Sparse Matrix Methods for Large Linear Systems On Vector Supercomputers , 1987 .

[4]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[5]  李幼升,et al.  Ph , 1989 .

[6]  Timothy A. Davis,et al.  Accelerating sparse cholesky factorization on GPUs , 2014, IA3 '14.

[7]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[8]  Cleve Ashcraft,et al.  Comparison of three column-based distributed sparse factorization schemes. Research report , 1990 .

[9]  Iain S. Duff,et al.  The Multifrontal Solution of Unsymmetric Sets of Linear Equations , 1984 .

[10]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[11]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[12]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[13]  James Demmel,et al.  A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..

[14]  Barry W. Peyton,et al.  A Supernodal Cholesky Factorization Algorithm for Shared-Memory Multiprocessors , 1991, SIAM J. Sci. Comput..

[15]  Al Geist,et al.  Task scheduling for parallel sparse Cholesky factorization , 1990, International Journal of Parallel Programming.

[16]  Patrick Amestoy,et al.  Vectorization of a Multiprocessor Multifrontal Code , 1989, Int. J. High Perform. Comput. Appl..

[17]  Joseph W. H. Liu,et al.  The multifrontal method and paging in sparse Cholesky factorization , 1989, TOMS.

[18]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[19]  Joseph W. H. Liu,et al.  A Comparison of Three Column-Based Distributed Sparse Factorization Schemes. , 1990 .

[20]  Jean-Yves L'Excellent,et al.  Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects , 2012 .

[21]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[22]  Jean-Yves L'Excellent,et al.  Introduction of shared-memory parallelism in a distributed-memory multifrontal solver , 2013 .