Design of a Multicore Sparse Cholesky Factorization Using DAGs

The rapid emergence of multicore machines has led to the need to design new algorithms that are efficient on these architectures. Here, we consider the solution of sparse symmetric positive-definite linear systems by Cholesky factorization. We were motivated by the successful division of the computation in the dense case into tasks on blocks and use of a task manager to exploit all the parallelism that is available between these tasks, whose dependencies may be represented by a directed acyclic graph (DAG). Our sparse algorithm is built on the assembly tree and subdivides the work at each node into tasks on blocks of the Cholesky factor. The dependencies between these tasks may again be represented by a DAG. To limit memory requirements, blocks are updated directly rather than through generated-element matrices. Our algorithm is implemented within a new efficient and portable solver HSL_MA87. It is written in Fortran 95 plus OpenMP and is available as part of the software library HSL. Using problems arising from a range of applications, we present experimental results that support our design choices and demonstrate that HSL_MA87 obtains good serial and parallel times on our 8-core test machines. Comparisons are made with existing modern solvers and show that HSL_MA87 performs well, particularly in the case of very large problems.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[3]  Iain S. Duff,et al.  Parallel implementation of multifrontal schemes , 1986, Parallel Comput..

[4]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[5]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[6]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..

[7]  Barry W. Peyton,et al.  A Supernodal Cholesky Factorization Algorithm for Shared-Memory Multiprocessors , 1991, SIAM J. Sci. Comput..

[8]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[9]  Jean Roman,et al.  Sparse Matrix Ordering with SCOTCH , 1997, HPCN Europe.

[10]  Vipin Kumar,et al.  WSSMP: A High-Performance Serial and Parallel Symmetric Sparse Linear Solver , 1998, PARA.

[11]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[12]  James Demmel,et al.  An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination , 1997, SIAM J. Matrix Anal. Appl..

[13]  Pascal Hénon,et al.  PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions , 2000, IPDPS Workshops.

[14]  Pascal Hénon,et al.  PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems , 2000 .

[15]  Patrick Amestoy,et al.  Hybridizing Nested Dissection and Halo Approximate Minimum Degree for Efficient Sparse Matrix Ordering , 1999, Concurr. Pract. Exp..

[16]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[17]  Pascal Hénon,et al.  PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..

[18]  Dror Irony,et al.  Parallel and fully recursive multifrontal sparse Cholesky , 2004, Future Gener. Comput. Syst..

[19]  Olaf Schenk,et al.  Solving unsymmetric sparse systems of linear equations with PARDISO , 2004, Future Gener. Comput. Syst..

[20]  Iain S. Duff,et al.  MA57---a code for the solution of sparse symmetric definite and indefinite systems , 2004, TOMS.

[21]  Michael T. Heath,et al.  Solution of sparse positive definite systems on a shared-memory multiprocessor , 1986, International Journal of Parallel Programming.

[22]  Al Geist,et al.  Task scheduling for parallel sparse Cholesky factorization , 1990, International Journal of Parallel Programming.

[23]  John A. Gunnels,et al.  A fully portable high performance minimal storage hybrid format Cholesky algorithm , 2005, TOMS.

[24]  Julien Langou,et al.  The Impact of Multicore on Math Software , 2006, PARA.

[25]  Nicholas I. M. Gould,et al.  A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations , 2007, TOMS.

[26]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[27]  Pierre Ramet,et al.  A NUMA Aware Scheduler for a Parallel Sparse Direct Solver , 2009 .

[28]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[29]  Jd Hogg,et al.  A DAG-based parallel Cholesky factorization for multicore systems , 2008 .

[30]  Gil Shklarski,et al.  Parallel unsymmetric-pattern multifrontal sparse LU with column preordering , 2008, TOMS.

[31]  Xiaoye S. Li Evaluation of SuperLU on multicore architectures , 2008 .

[32]  Pierre Ramet,et al.  Dynamic scheduling for sparse direct solver on NUMA architectures , 2008 .

[33]  YANQING CHEN,et al.  Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .

[34]  Jennifer A. Scott,et al.  An out-of-core sparse Cholesky solver , 2009, TOMS.

[35]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[36]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[37]  Andrew Lumsdaine,et al.  PFunc: modern task parallelism for modern high performance computing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[38]  Cédric Augonnet,et al.  StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .

[39]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.