论文信息 - A Feature Selection Algorithm Capable of Handling Extremely Large Data Dimensionality

A Feature Selection Algorithm Capable of Handling Extremely Large Data Dimensionality

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

Sinisa Todorovic | Yijun Sun | Steve Goodison

[1] Michael I. Jordan,et al. Convergence rates of the Voting Gibbs classifier, with application to Bayesian feature selection , 2001, ICML.

[2] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[3] Jason Weston,et al. Embedded Methods , 2006, Feature Extraction.

[4] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[5] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[6] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[7] Huan Liu. Feature Selection , 2010, Encyclopedia of Machine Learning.

[8] Naftali Tishby,et al. Margin based feature selection - theory and algorithms , 2004, ICML.

[9] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[10] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[11] J. Miller. Numerical Analysis , 1966, Nature.