Given a matrix $A$ of size $m\times n$, the manuscript describes a algorithm for computing a QR factorization $AP=QR$ where $P$ is a permutation matrix, $Q$ is orthonormal, and $R$ is upper triangular. The algorithm is blocked, to allow it to be implemented efficiently. The need for single vector pivoting in classical algorithms for computing QR factorizations is avoided by the use of randomized sampling to find blocks of pivot vectors at once. The advantage of blocking becomes particularly pronounced when $A$ is very large, and possibly stored out-of-core, or on a distributed memory machine. The manuscript also describes a generalization of the QR factorization that allows $P$ to be a general orthonormal matrix. In this setting, one can at moderate cost compute a \textit{rank-revealing} factorization where the mass of $R$ is concentrated to the diagonal entries. Moreover, the diagonal entries of $R$ closely approximate the singular values of $A$. The algorithms described have asymptotic flop count $O(m\,n\,\min(m,n))$, just like classical deterministic methods. The scaling constant is slightly higher than those of classical techniques, but this is more than made up for by reduced communication and the ability to block the computation.
[1]
T. Chan.
Rank revealing QR factorizations
,
1987
.
[2]
C. Eckart,et al.
The approximation of one matrix by another of lower rank
,
1936
.
[3]
Robert H. Halstead,et al.
Matrix Computations
,
2011,
Encyclopedia of Parallel Computing.
[4]
Christian H. Bischof,et al.
The WY representation for products of householder matrices
,
1985,
PPSC.
[5]
Ming Gu,et al.
Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization
,
1996,
SIAM J. Sci. Comput..
[6]
Sjsu ScholarWorks,et al.
Rank revealing QR factorizations
,
2014
.