论文信息 - Feature selection for regularized least-squares: New computational short-cuts and fast algorithmic implementations

Feature selection for regularized least-squares: New computational short-cuts and fast algorithmic implementations

We propose novel computational short-cuts for constructing sparse linear predictors with regularized least-squares (RLS), also known as the least-squares support vector machine or ridge regression. The short-cuts make it possible to accelerate the search in the power set of features with leave-one-out criterion as a search heuristic. Our first short-cut finds the optimal search direction in the power set. The direction means either adding a new feature into the set of selected features or removing one of the previously added features. The second short-cut updates the set of selected features and the corresponding RLS solution according to a given direction. The computational complexities of both short-cuts are O(mn), where m and n are the numbers of training examples and features, respectively. The short-cuts can be used with various different feature selection strategies. As case studies, we present efficient implementations of greedy and floating forward feature selection algorithm for RLS.

[1] T. Salakoski,et al. Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[2] Tapio Pahikkala,et al. An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[3] Tong Zhang,et al. Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[4] Jing Peng,et al. SVM vs regularized least squares classification , 2004, ICPR 2004.

[5] Josef Kittler,et al. Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[6] Tapio Salakoski,et al. Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[7] T. Salakoski,et al. Linear Time Feature Selection for Regularized Least-Squares , 2010, 1003.3570.

[8] Johan A. K. Suykens,et al. Low rank updated LS-SVM classifiers for fast variable selection , 2008, Neural Networks.

[9] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10] Xin Yao,et al. Gene selection algorithms for microarray data based on least squares support vector machine , 2006, BMC Bioinformatics.

[11] Alexander Gammerman,et al. Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[12] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13] R. Rifkin,et al. Notes on Regularized Least Squares , 2007 .

[14] Marc M. Van Hulle,et al. Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis , 2006, ICANN.

[15] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[16] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.