论文信息 - Roundoff error analysis of the Cholesky QR2 algorithm

Roundoff error analysis of the Cholesky QR2 algorithm

We consider the QR decomposition of an m × n matrix X with full column rank, where m ≥ n. Among the many algorithms available, the Cholesky QR algorithm is ideal from the viewpoint of high performance computing since it consists entirely of standard level 3 BLAS operations with large matrix sizes, and requires only one reduce and broadcast in parallel environments. Unfortunately, it is well-known that the algorithm is not numerically stable and the deviation from orthogonality of the computed Q factor is of order O((κ2(X))u), where κ2(X) is the 2-norm condition number of X and u is the unit roundoff. In this paper, we show that if the condition number of X is not too large, we can greatly improve the stability by iterating the Cholesky QR algorithm twice. More specifically, if κ2(X) is at most O(u− 1 2 ), both the residual and deviation from orthogonality are shown to be of order O(u). Numerical results support our theoretical analysis.

[1] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[2] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[3] Walter Gander,et al. Gram‐Schmidt orthogonalization: 100 years and more , 2013, Numer. Linear Algebra Appl..

[4] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..

[5] D. Sorensen. Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[6] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[7] Sivan Toledo,et al. Very Large Electronic Structure Calculations Using an Out-of-Core Filter-Diagonalization Method , 2002 .

[8] Miroslav Rozlozník,et al. Cholesky-Like Factorization of Symmetric Indefinite Matrices and Orthogonalization with Respect to Bilinear Forms , 2015, SIAM J. Matrix Anal. Appl..

[9] B. Parlett. The Symmetric Eigenvalue Problem , 1981 .

[10] Kesheng Wu,et al. A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..

[11] Julien Langou,et al. Rounding error analysis of the classical Gram-Schmidt orthogonalization process , 2005, Numerische Mathematik.

[12] G. Stewart,et al. Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization , 1976 .

[13] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .

[14] W. Jalbyf,et al. STABILITY ANALYSIS AND IMPROVEMENT OF THE BLOCK GRAM-SCHMIDT ALGORITHM , .

[15] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[16] Yusaku Yamamoto,et al. Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection , 2014, VECPAR.

[17] Yusaku Yamamoto,et al. CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System , 2014, 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.

[18] Car,et al. Unified approach for molecular dynamics and density-functional theory. , 1985, Physical review letters.

[19] Taisuke Boku,et al. A massively-parallel electronic-structure calculations based on real-space density functional theory , 2010, J. Comput. Phys..

[20] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.