Efficient Spectral Feature Selection with Minimum Redundancy

Spectral feature selection identifies relevant features by measuring their capability of preserving sample similarity. It provides a powerful framework for both supervised and unsupervised feature selection, and has been proven to be effective in many real-world applications. One common drawback associated with most existing spectral feature selection algorithms is that they evaluate features individually and cannot identify redundant features. Since redundant features can have significant adverse effect on learning performance, it is necessary to address this limitation for spectral feature selection. To this end, we propose a novel spectral feature selection algorithm to handle feature redundancy, adopting an embedded model. The algorithm is derived from a formulation based on a sparse multi-output regression with a L2,1-norm constraint. We conduct theoretical analysis on the properties of its optimal solutions, paving the way for designing an efficient path-following solver. Extensive experiments show that the proposed algorithm can do well in both selecting relevant features and removing redundancy.

[1]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[2]  Michelangelo Ceci,et al.  Redundant feature elimination for multi-class problems , 2004, ICML.

[3]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[4]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[5]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[6]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[7]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  Jieping Ye,et al.  A least squares formulation for a class of generalized eigenvalue problems in machine learning , 2009, ICML '09.

[10]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  Evgeniy Gabrilovich,et al.  Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5 , 2004, ICML.

[13]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[14]  Martin J. Wainwright,et al.  High-dimensional support union recovery in multivariate regression , 2008, NIPS.

[15]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[16]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[17]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[18]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Andrew N. Radford,et al.  The higher the better: sentinel height influences foraging success in a social bird , 2009, Proceedings of the Royal Society B: Biological Sciences.

[22]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[23]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[24]  David G. Stork,et al.  Pattern Classification , 1973 .

[25]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[26]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[27]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[28]  Terry Windeatt,et al.  Relevant and Redundant Feature Analysis with Ensemble Classification , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.