Almost linear-time computation of the singular value decomposition using mesh-connected processors

A cyclic Jacobi method for computing the singular value decomposition of an $mxn$ matrix $(m \geq n)$ using systolic arrays is proposed. The algorithm requires $O(n^{2})$ processors and $O(m + n \log n)$ units of time.