A Feature Selection Algorithm Capable of Handling Extremely Large Data Dimensionality

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

[1]  Michael I. Jordan,et al.  Convergence rates of the Voting Gibbs classifier, with application to Bayesian feature selection , 2001, ICML.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[6]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[7]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[8]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[9]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[10]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[11]  J. Miller Numerical Analysis , 1966, Nature.

[12]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[13]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[18]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[19]  Pavel Pudil,et al.  Novel Methods for Subset Selection with Respect to Problem Knowledge , 1998, IEEE Intell. Syst..

[20]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.