Feature selection for regularized least-squares: New computational short-cuts and fast algorithmic implementations

We propose novel computational short-cuts for constructing sparse linear predictors with regularized least-squares (RLS), also known as the least-squares support vector machine or ridge regression. The short-cuts make it possible to accelerate the search in the power set of features with leave-one-out criterion as a search heuristic. Our first short-cut finds the optimal search direction in the power set. The direction means either adding a new feature into the set of selected features or removing one of the previously added features. The second short-cut updates the set of selected features and the corresponding RLS solution according to a given direction. The computational complexities of both short-cuts are O(mn), where m and n are the numbers of training examples and features, respectively. The short-cuts can be used with various different feature selection strategies. As case studies, we present efficient implementations of greedy and floating forward feature selection algorithm for RLS.

[1]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[2]  Tapio Pahikkala,et al.  An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[3]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[4]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[5]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[6]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[7]  T. Salakoski,et al.  Linear Time Feature Selection for Regularized Least-Squares , 2010, 1003.3570.

[8]  Johan A. K. Suykens,et al.  Low rank updated LS-SVM classifiers for fast variable selection , 2008, Neural Networks.

[9]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10]  Xin Yao,et al.  Gene selection algorithms for microarray data based on least squares support vector machine , 2006, BMC Bioinformatics.

[11]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[14]  Marc M. Van Hulle,et al.  Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis , 2006, ICANN.

[15]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[16]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.