Roundoff error analysis of the Cholesky QR2 algorithm

We consider the QR decomposition of an m × n matrix X with full column rank, where m ≥ n. Among the many algorithms available, the Cholesky QR algorithm is ideal from the viewpoint of high performance computing since it consists entirely of standard level 3 BLAS operations with large matrix sizes, and requires only one reduce and broadcast in parallel environments. Unfortunately, it is well-known that the algorithm is not numerically stable and the deviation from orthogonality of the computed Q factor is of order O((κ2(X))u), where κ2(X) is the 2-norm condition number of X and u is the unit roundoff. In this paper, we show that if the condition number of X is not too large, we can greatly improve the stability by iterating the Cholesky QR algorithm twice. More specifically, if κ2(X) is at most O(u− 1 2 ), both the residual and deviation from orthogonality are shown to be of order O(u). Numerical results support our theoretical analysis.

[1]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[2]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[3]  Walter Gander,et al.  Gram‐Schmidt orthogonalization: 100 years and more , 2013, Numer. Linear Algebra Appl..

[4]  Erik Elmroth,et al.  Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..

[5]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[6]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[7]  Sivan Toledo,et al.  Very Large Electronic Structure Calculations Using an Out-of-Core Filter-Diagonalization Method , 2002 .

[8]  Miroslav Rozlozník,et al.  Cholesky-Like Factorization of Symmetric Indefinite Matrices and Orthogonalization with Respect to Bilinear Forms , 2015, SIAM J. Matrix Anal. Appl..

[9]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[10]  Kesheng Wu,et al.  A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..

[11]  Julien Langou,et al.  Rounding error analysis of the classical Gram-Schmidt orthogonalization process , 2005, Numerische Mathematik.

[12]  G. Stewart,et al.  Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization , 1976 .

[13]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[14]  W. Jalbyf,et al.  STABILITY ANALYSIS AND IMPROVEMENT OF THE BLOCK GRAM-SCHMIDT ALGORITHM , .

[15]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[16]  Yusaku Yamamoto,et al.  Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection , 2014, VECPAR.

[17]  Yusaku Yamamoto,et al.  CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System , 2014, 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.

[18]  Car,et al.  Unified approach for molecular dynamics and density-functional theory. , 1985, Physical review letters.

[19]  Taisuke Boku,et al.  A massively-parallel electronic-structure calculations based on real-space density functional theory , 2010, J. Comput. Phys..

[20]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.