Efficient Parallel Nonnegative Least Squares on Multicore Architectures

We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson [Solving Least Squares Problems, Prentice-Hall, 1974] on multicore architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and downdate the $QR$ factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multicore CPUs. Both synthetic and nonsynthetic data are used in the experiments.

[1]  Donghui Chen,et al.  Nonnegativity constraints in numerical analysis , 2009, The Birth of Numerical Analysis.

[2]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  R. Plemmons,et al.  On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices , 2004 .

[4]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[5]  Mark A. Richards,et al.  QR decomposition on GPUs , 2009, GPGPU-2.

[6]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[7]  Sven Hammarling,et al.  Updating the QR factorization and the least squares problem , 2008 .

[8]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[9]  I. Dhillon,et al.  A New Projected Quasi-Newton Approach for the Nonnegative Least Squares Problem , 2006 .

[10]  Stefania Bellavia,et al.  An interior point Newton‐like method for non‐negative least‐squares problems with degenerate solution , 2006, Numer. Linear Algebra Appl..

[11]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[12]  R. Bro,et al.  A fast non‐negativity‐constrained least squares algorithm , 1997 .

[13]  R. Brent,et al.  QR factorization of Toeplitz matrices , 1986 .

[14]  Václav Hlavác,et al.  Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem , 2005, CAIP.

[15]  M. V. Van Benthem,et al.  Fast algorithm for the solution of large‐scale non‐negativity‐constrained least squares problems , 2004 .

[16]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[17]  J. Nagy,et al.  FFT-based preconditioners for Toeplitz-block least squares problems , 1993 .