Shifted Cholesky QR for Computing the QR Factorization of Ill-Conditioned Matrices

The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix. Unfortunately it has the inherent numerical instability and breakdown when the matrix is ill-conditioned. A recent work establishes that the instability can be cured by repeating the algorithm twice (called CholeskyQR2). However, the applicability of CholeskyQR2 is still limited by the requirement that the Cholesky factorization of the Gram matrix runs to completion, which means it does not always work for matrices $X$ with $\kappa_2(X)\gtrsim {{\bf u}}^{-\frac{1}{2}}$ where ${{\bf u}}$ is the unit roundoff. In this work we extend the applicability to $\kappa_2(X)=\mathcal{O}({\bf u}^{-1})$ by introducing a shift to the computed Gram matrix so as to guarantee the Cholesky factorization $R^TR= A^TA+sI$ succeeds numerically. We show that the computed $AR^{-1}$ has reduced condition number $\leq {{\bf u}}^{-\frac{1}{2}}$, for which CholeskyQR2 safely computes the QR factorization, yielding a computed $Q$ of orthogonality $\|Q^TQ-I\|_2$ and residual $\|A-QR\|_F/\|A\|_F$ both $\mathcal{O}({{\bf u}})$. Thus we obtain the required QR factorization by essentially running Cholesky QR thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR to reveal its excellent numerical stability. shiftedCholeskyQR is also highly parallelizable, and applicable and effective also when working in an oblique inner product space. We illustrate our findings through experiments, in which we achieve significant (up to x40) speedup over alternative methods.

[1]  Julien Langou,et al.  Stability Analysis of QR factorization in an Oblique Inner Product , 2014, 1401.5171.

[2]  Ramaseshan Kannan Efficient sparse matrix multiple-vector multiplication using a bitmapped format , 2013, 20th Annual International Conference on High Performance Computing.

[3]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[4]  J. Demmel,et al.  On Floating Point Errors in Cholesky , 1989 .

[5]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[6]  Julien Langou,et al.  Rounding error analysis of the classical Gram-Schmidt orthogonalization process , 2005, Numerische Mathematik.

[7]  J. Demmel,et al.  Implementing Communication-Optimal Parallel and Sequential QR Factorizations , 2008, 0809.2407.

[8]  Yusaku Yamamoto,et al.  CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System , 2014, 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.

[9]  Yusaku Yamamoto,et al.  Roundoff error analysis of the CholeskyQR2 algorithm in an oblique inner product , 2016, JSIAM Lett..

[10]  Kesheng Wu,et al.  A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..

[11]  Siegfried M. Rump,et al.  Super-fast validated solution of linear systems , 2007 .

[12]  Jack J. Dongarra,et al.  Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs , 2015, SIAM J. Sci. Comput..

[13]  Ramaseshan Kannan,et al.  Numerical Linear Algebra Problems in Structural Analysis , 2014 .

[14]  Yusaku Yamamoto,et al.  Roundoff error analysis of the Cholesky QR2 algorithm , 2015 .

[15]  Shin'ichi Oishi,et al.  A modified algorithm for accurate inverse Cholesky factorization , 2014 .

[16]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[17]  Alicja Smoktunowicz,et al.  Numerical stability of orthogonalization methods with a non-standard inner product , 2012 .