Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations

Low-rank matrices arise in many scientific and engineering computations. Both computational and storage costs of manipulating such matrices may be reduced by taking advantages of their low-rank properties. To compute a low-rank approximation of a dense matrix, in this paper, we study the performance of QR factorization with column pivoting or with restricted pivoting on multicore CPUs with a GPU. We first propose several techniques to reduce the postprocessing time, which is required for restricted pivoting, on a modern CPU. We then examine the potential of using a GPU to accelerate the factorization process with both column and restricted pivoting. Our performance results on two eight-core Intel Sandy Bridge CPUs with one NVIDIA Kepler GPU demonstrate that using the GPU, the factorization time can be reduced by a factor of more than two. In addition, to study the performance of our implementations in practice, we integrate them into a recently developed software StruMF which algebraically exploits such low-rank structures for solving a general sparse linear system of equations. Our performance results for solving Poisson's equations demonstrate that the proposed techniques can significantly reduce the preconditioner construction time of StruMF on the CPUs, and the construction time can be further reduced by 10%–50% using the GPU.

[1]  G. W. Stewart Incremental Condition Calculation and Column Selection , 1998 .

[2]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[3]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[4]  Jack J. Dongarra,et al.  An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..

[5]  P. Tang,et al.  Bounds on Singular Values Revealed by QR Factorizations , 1999 .

[6]  Andrés Tomás,et al.  Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors , 2012, VECPAR.

[7]  C. Bischof Incremental condition estimation , 1990 .

[8]  Jack Dongarra,et al.  An Improved MAGMA GEMM for Fermi GPUs , 2010 .

[9]  Shivkumar Chandrasekaran,et al.  A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations , 2006, SIAM J. Matrix Anal. Appl..

[10]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[11]  Enrique S. Quintana-Ortí,et al.  Parallel codes for computing the numerical rank , 1998 .

[12]  Jack J. Dongarra,et al.  Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Christian H. Bischof,et al.  A BLAS-3 Version of the QR Factorization with Column Pivoting , 1998, SIAM J. Sci. Comput..

[14]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[15]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[16]  Zvonimir Bujanovic,et al.  On the Failure of Rank-Revealing QR Factorization Software -- A Case Study , 2008, TOMS.

[17]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[18]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Christian H. Bischof,et al.  Computing rank-revealing QR factorizations of dense matrices , 1998, TOMS.

[21]  Xiaoye S. Li,et al.  An algebraic multifrontal preconditioner that exploits the low‐rank property , 2016, Numer. Linear Algebra Appl..

[22]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[23]  James Demmel,et al.  Communication Avoiding Rank Revealing QR Factorization with Column Pivoting , 2015, SIAM J. Matrix Anal. Appl..

[24]  C. Bischof,et al.  Robust incremental condition estimation , 1991 .

[25]  Ilse C. F. Ipsen,et al.  On Rank-Revealing Factorisations , 1994, SIAM J. Matrix Anal. Appl..

[26]  Josef Stoer,et al.  Numerische Mathematik 1 , 1989 .