Speeding Up Greedy Forward Selection for Regularized Least-Squares

We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm compared to previously proposed implementations.

[1]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[2]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[3]  Johan A. K. Suykens,et al.  Low rank updated LS-SVM classifiers for fast variable selection , 2008, Neural Networks.

[4]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[5]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[6]  Abel M. Rodrigues Matrix Algebra Useful for Statistics , 2007 .

[7]  T. Pahikkala Greedy RankRLS : a Linear Time Algorithm for Learning Sparse Ranking Models , 2010 .

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Xin Yao,et al.  Gene selection algorithms for microarray data based on least squares support vector machine , 2006, BMC Bioinformatics.

[10]  Tapio Salakoski,et al.  Feature selection for regularized least-squares: New computational short-cuts and fast algorithmic implementations , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[11]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[12]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[13]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[14]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[15]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[16]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[17]  Tapio Pahikkala,et al.  An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[18]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..