Two-level preconditioning for Ridge Regression.

Solving linear systems is often the computational bottleneck in real-life problems. Iterative solvers are the only option due to the complexity of direct algorithms or because the system matrix is not explicitly known. Here, we develop a two-level preconditioner for regularized least squares linear systems involving a feature or data matrix. Variants of this linear system may appear in machine learning applications, such as ridge regression, logistic regression, support vector machines and Bayesian regression. We use clustering algorithms to create a coarser level that preserves the principal components of the covariance or Gram matrix. This coarser level approximates the dominant eigenvectors and is used to build a subspace preconditioner accelerating the Conjugate Gradient method. We observed speed-ups for artificial and real-life data.

[1]  Yousef Saad,et al.  A Flexible Inner-Outer Preconditioned GMRES Algorithm , 1993, SIAM J. Sci. Comput..

[2]  P. Hansen,et al.  Subspace Preconditioned LSQR for Discrete Ill-Posed Problems , 2003 .

[3]  Yves Moreau,et al.  Macau: Scalable Bayesian factorization with high-dimensional side information using MCMC , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[4]  A. Brandt General highly accurate algebraic coarsening. , 2000 .

[5]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[6]  Chih-Jen Lin,et al.  Incomplete Cholesky Factorizations with Limited Memory , 1999, SIAM J. Sci. Comput..

[7]  Yvan Notay Flexible Conjugate Gradients , 2000, SIAM J. Sci. Comput..

[8]  William L. Briggs,et al.  A multigrid tutorial , 1987 .

[9]  A. Rényi On Measures of Entropy and Information , 1961 .

[10]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[11]  D. Kershaw The incomplete Cholesky—conjugate gradient method for the iterative solution of systems of linear equations , 1978 .

[12]  Yvan Notay,et al.  Algebraic Two-Level Convergence Theory for Singular Systems , 2016, SIAM J. Matrix Anal. Appl..

[13]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[14]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[15]  Mark Girolami,et al.  Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem , 2002, Neural Computation.

[16]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[17]  Thomas A. Manteuffel,et al.  Operator‐based interpolation for bootstrap algebraic multigrid , 2010, Numer. Linear Algebra Appl..

[18]  Michele Benzi,et al.  A Sparse Approximate Inverse Preconditioner for the Conjugate Gradient Method , 1996, SIAM J. Sci. Comput..

[19]  Thomas A. Manteuffel,et al.  Adaptive Algebraic Multigrid , 2005, SIAM J. Sci. Comput..

[20]  Yousef Saad,et al.  ILUT: A dual threshold incomplete LU factorization , 1994, Numer. Linear Algebra Appl..

[21]  Lothar Reichel,et al.  Cascadic multilevel methods for ill-posed problems , 2010, J. Comput. Appl. Math..

[22]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[23]  Achi Brandt,et al.  Bootstrap AMG , 2011, SIAM J. Sci. Comput..

[24]  A. Sluis Condition numbers and equilibration of matrices , 1969 .

[25]  Yvan Notay,et al.  Recursive Krylov‐based multigrid cycles , 2008, Numer. Linear Algebra Appl..

[26]  Robert D. Falgout,et al.  Compatible Relaxation and Coarsening in Algebraic Multigrid , 2009, SIAM J. Sci. Comput..

[27]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[28]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[29]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .