Communication-Avoiding Symmetric-Indefinite Factorization

We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular tridiagonalization. It factors a dense symmetric matrix $A$ as the product $A=PLTL^{T}P^{T},$ where $P$ is a permutation matrix, $L$ is lower triangular, and $T$ is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal.

[1]  James Demmel,et al.  Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout , 2013, SPAA.

[2]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[3]  Fred G. Gustavson,et al.  Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..

[4]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[5]  Sivan Toledo,et al.  THE SNAP-BACK PIVOTING METHOD FOR SYMMETRIC , 2006 .

[6]  M. SIAMJ. STABILITY OF THE DIAGONAL PIVOTING METHOD WITH PARTIAL PIVOTING , 1995 .

[7]  Jeffrey Scott Vitter,et al.  Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.

[8]  Pedro C. Diniz Exascale Programming Challenges , 2011 .

[9]  James Hardy Wilkinson,et al.  Reduction of the symmetric eigenproblemAx=λBx and related problems to standard form , 1968 .

[10]  James Demmel,et al.  CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..

[11]  Greg Henry,et al.  Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[12]  Robert A. van de Geijn,et al.  Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.

[13]  L. Trefethen,et al.  Average-case stability of Gaussian elimination , 1990 .

[14]  G. Miller On the Solution of a System of Linear Equations , 1910 .

[15]  Shang-Hua Teng,et al.  Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices , 2003, SIAM J. Matrix Anal. Appl..

[16]  Gil Shklarski,et al.  Partitioned Triangular Tridiagonalization , 2011, TOMS.

[17]  J. O. Aasen On the reduction of a symmetric matrix to tridiagonal form , 1971 .

[18]  Grey Ballard,et al.  Avoiding Communication in Dense Linear Algebra , 2013 .

[19]  Marsha Fietze,et al.  Graduate student , 1955 .

[20]  Matemática,et al.  Society for Industrial and Applied Mathematics , 2010 .

[21]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[22]  B. S. Garbow,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[23]  Linda Kaufman,et al.  The retraction algorithm for factoring banded symmetric matrices , 2007, Numer. Linear Algebra Appl..

[24]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[25]  J. Bunch,et al.  Some stable methods for calculating inertia and solving symmetric linear systems , 1977 .

[26]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[27]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[28]  James Demmel,et al.  Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[29]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[30]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[31]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[32]  Jeffrey Scott Vitter,et al.  Optimal disk I/O with parallel block transfer , 1990, STOC '90.

[33]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[34]  N. Higham Notes on Accuracy and Stability of Algorithms in Numerical Linear Algebra , 1999 .

[35]  James Demmel,et al.  LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version , 2012, SIAM J. Matrix Anal. Appl..

[36]  Dror Irony,et al.  The Snap-Back Pivoting Method for Symmetric Banded Indefinite Matrices , 2006, SIAM J. Matrix Anal. Appl..

[37]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[38]  Sivan Toledo Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..

[39]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[40]  Jennifer A. Scott,et al.  Partial factorization of a dense symmetric indefinite matrix , 2012, ACM Trans. Math. Softw..

[41]  S. VitterJ.,et al.  Algorithms for parallel memory, I , 1994 .