Parallel Tri- and Bi-Diagonalization of Bordered Bidiagonal Matrices

We have previously presented various plane rotation patterns, which provide stable O(N2) algorithms for reducing a b-band matrix of order N bordered by p rows and/or columns to (b + p)-band) form, where b ⩾ 1 and p⩾ 1. By splitting the matrix into two similarly structured submatrices and chasing nonzeros to the corners in two directions, the newly proposed patterns reduce the computational cost by 50% compared to the other existing one-way chasing algorithms. In this paper, we show how these rotation patterns can be efficiently parallelized when reducing a one-bordered bidiagonal matrix to tridiagonal form possibly followed by bidiagonalization. Applications are found in updating total least squares solutions and signal or noise subspaces by means of a partial singular value decomposition. For each scheme, a linear systolic network and a parallel VLSI computing structure are presented. These architectures are able to reduce the overall computing time for the tridiagonalization from O(N2) to O(N) using O(N) processors. In particular, it is shown that the best two-way chasing parallel implementation reduces the computation time of the tridiagonalization by 50% compared to the one-way chasing parallel implementation, using the same number of processors. If additionally the original bandwidth is restored, then all proposed two-way chasing parallel implementations achieve an 8% reduction in overall computing time compared to the one-way chasing scheme.