A Parallel QR Factorization Algorithm with Controlled Local Pivoting

This paper presents a new version of the Householder algorithm with column pivoting for computing a QR factorization that identifies rank and range space of a given matrix. The standard pivoting technique is not well suited for parallel computation, since it requires synchronization at every step in order to choose the next pivot column. In contrast, a restricted pivoting scheme that restricts the choice of pivot columns and avoids this synchronization constraint is employed. Incremental condition estimation is used to assess the effect that the addition of a candidate pivot column would have on the condition number of the matrix being generated. This safeguard ensures that this local strategy selects pivot columns that make sense in the global context of the computation. The resulting algorithm is well suited for implementation on a parallel machine, in particular, a MIMD machine with distributed memory. Simulations demonstrate that the numerical behavior of the restricted pivoting strategy is comparable to the traditional global pivoting strategy. Implementation results of the QR factorization algorithm without pivoting and with local and traditional pivoting on the Intel iPSC/1 and iPSC/2 hypercubes show that our scheme about halves the extra time required for pivoting.

[1]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[2]  G. Stewart,et al.  Rank degeneracy and least squares problems , 1976 .

[3]  G. Stewart The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators , 1980 .

[4]  Thomas F. Coleman,et al.  Large Sparse Numerical Optimization , 1984, Lecture Notes in Computer Science.

[5]  G. A. Geist,et al.  Parallel Cholesky factorization on a hypercube multiprocessor , 1985 .

[6]  Ilse C. F. Ipsen,et al.  Complexity of dense linear system solution on a multiprocessor ring. Research report , 1986 .

[7]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[8]  L. Foster Rank and null space calculations using matrix decomposition without column interchanges , 1986 .

[9]  Jack J. Dongarra,et al.  Implementation of some concurrent algorithms for matrix factorization , 1986, Parallel Comput..

[10]  T. Chan Rank revealing QR factorizations , 1987 .

[11]  Christian H. Bischof A Pipelined Block QR Decomposition Algorithm , 1987, PPSC.

[12]  Paul E. Plassmann,et al.  Solution of Nonlinear Least Squares Problems on a Multiprocessor , 1988, Shell Conference.

[13]  Thomas F. Coleman,et al.  A parallel triangular solver for distributed-memory multiprocessor , 1988 .

[14]  M. H. Schultz,et al.  Topological properties of hypercubes , 1988, IEEE Trans. Computers.

[15]  William Jalby,et al.  Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design , 1988 .

[16]  Michael T. Heath,et al.  Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .

[17]  Michael T. Heath,et al.  Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors , 1988 .

[18]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[19]  Thomas F. Coleman,et al.  A New Method for Solving Triangular Systems on Distributed Memory Message-Passing Multiprocessors , 1989 .

[20]  Thomas F. Coleman,et al.  Solving Systems of Nonlinear Equations on a Message-Passing Multiprocessor , 1990, SIAM J. Sci. Comput..

[21]  C. Bischof Incremental condition estimation , 1990 .