Locality of reference in sparse Cholesky factorization methods.

This paper analyzes the cache efficiency of two high-performance sparse Cholesky factorization algorithms: the multifrontal algorithm and the left-looking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in general-symmetric and general-unsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cache-miss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finite-element structural analysis, the left-looking algorithm consistently outperforms the multifrontal algorithm. Direct cache-miss measurements indicate that the difference in performance is largely due to differences in the number of level-2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the left-looking algorithm. On the other hand, the left-looking algorithm never uses more memory than the multifrontal one.

[1]  G. W. Stewart,et al.  Matrix Algorithms: Volume 1, Basic Decompositions , 1998 .

[2]  Joseph W. H. Liu,et al.  On Finding Supernodes for Sparse Matrix Computations , 1993, SIAM J. Matrix Anal. Appl..

[3]  James Demmel,et al.  A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..

[4]  Barry W. Peyton,et al.  Block sparse Cholesky algorithms on advanced uniprocessor computers , 1991 .

[5]  Sivan Toledo,et al.  High-Performance Out-of-Core Sparse LU Factorization , 1999, PPSC.

[6]  Joseph W. H. Liu,et al.  The multifrontal method and paging in sparse Cholesky factorization , 1989, TOMS.

[7]  Robert Schreiber,et al.  Efficient Methods for Out-of-Core Sparse Cholesky Factorization , 1999, SIAM J. Sci. Comput..

[8]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[9]  Sivan Toledo Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..

[10]  G. W. Stewart,et al.  Matrix algorithms , 1998 .

[11]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[12]  Sivan Toledo,et al.  A survey of out-of-core algorithms in numerical linear algebra , 1999, External Memory Algorithms.

[13]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[14]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[15]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[16]  Patrick R. Amestoy,et al.  Multifrontal parallel distributed symmetric and unsymmetric solvers , 2000 .

[17]  Joseph W. H. Liu,et al.  On the storage requirement in the out-of-core multifrontal method for sparse factorization , 1986, TOMS.

[18]  Patrick R. Amestoy,et al.  MUltifrontal Massively Parallel Solver (MUMPS Version 4.3) Users' guide , 2003 .

[19]  Sivan Toledo,et al.  The design and implementation of a new out-of-core sparse cholesky factorization method , 2004, TOMS.

[20]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[21]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[22]  Sandeep Sen,et al.  Towards a theory of cache-efficient algorithms , 2000, SODA '00.

[23]  Robert Schreiber,et al.  A New Implementation of Sparse Gaussian Elimination , 1982, TOMS.

[24]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[25]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[26]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[27]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[28]  Anoop Gupta,et al.  Efficient sparse matrix factorization on high performance workstations—exploiting the memory hierarchy , 1991, TOMS.

[29]  Edward G. Coffman,et al.  Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.

[30]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[31]  Roger Grimes,et al.  The influence of relaxed supernode partitions on the multifrontal method , 1989, TOMS.

[32]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .