Faster Ridge Regression via the Subsampled Randomized Hadamard Transform

We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations (p ≫ n). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of O(n2p). Our algorithm Subsampled Randomized Hadamard Transform- Dual Ridge Regression (SRHT-DRR) runs in time O(np log(n)) and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets.

[1]  Isabelle Guyon,et al.  Design of experiments for the NIPS 2003 variable selection benchmark , 2003 .

[2]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[3]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[4]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[5]  Nathan Halko,et al.  An Algorithm for the Principal Component Analysis of Large Data Sets , 2010, SIAM J. Sci. Comput..

[6]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[7]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[8]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[9]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[10]  Sham M. Kakade,et al.  Analysis of a randomized approximation scheme for matrix multiplication , 2012, ArXiv.

[11]  Joel A. Tropp,et al.  Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[12]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[13]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[14]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[15]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[16]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[17]  Christos Boutsidis,et al.  Improved Matrix Algorithms via the Subsampled Randomized Hadamard Transform , 2012, SIAM J. Matrix Anal. Appl..

[18]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[19]  Mark Tygert,et al.  A fast algorithm for computing minimal-norm solutions to underdetermined systems of linear equations , 2009, ArXiv.

[20]  Sham M. Kakade,et al.  A risk comparison of ordinary least squares vs ridge regression , 2011, J. Mach. Learn. Res..

[21]  Michael A. Saunders,et al.  LSRN: A Parallel Iterative Solver for Strongly Over- or Underdetermined Systems , 2011, SIAM J. Sci. Comput..

[22]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.