TAKES: a fast method to select features in the kernel space

Feature selection is an effective tool to deal with the "curse of dimensionality". To cope with the non-separable problem, feature selection in the kernel space has been investigated. However, previous study cannot adequately estimate the intrinsic dimensionality of the kernel space. Thus, it is difficult to accurately preserve the sketch of the kernel space using the learned basis, and the feature selection performance is affected. Moreover, the computing load of the algorithm reaches at least cubic with the number of training data. In this paper, we propose a fast framework to conduct feature selection in the kernel space. By designing a fast kernel subspace learning method, we automatically learn the intrinsic dimensionality and construct an orthogonal basis set of kernel space. The learned basis can accurately preserve the sketch of kernel space. Then backed by the constructed basis, we directly select features in kernel space. The whole proposed framework has a quadratic complexity with the number of training data, which is faster than existing kernel methods for feature selection. We evaluate our work under several typical datasets and find it not only preserves the sketch of the kernel space more accurately but also achieves better classification performance compared with many state-of-the-art methods.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Jieping Ye,et al.  Efficient Kernel Discriminant Analysis via QR Decomposition , 2004, NIPS.

[3]  Joachim M. Buhmann,et al.  On Relevant Dimensions in Kernel Feature Spaces , 2008, J. Mach. Learn. Res..

[4]  Qiang Yang,et al.  Feature selection in a kernel space , 2007, ICML '07.

[5]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[8]  Hua Li,et al.  IMMC: incremental maximum margin criterion , 2004, KDD.

[9]  Ye Xu,et al.  Non-I.I.D. Multi-Instance Dimensionality Reduction by Learning a Maximum Bag Margin Subspace , 2010, AAAI.

[10]  Shihong Lao,et al.  Discriminant analysis in correlation similarity measure space , 2007, ICML '07.

[11]  Michael Kirby,et al.  Kernel/Feature Selection for Support Vector Machines Applied to Materials Design , 2000 .

[12]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[13]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[15]  Jugal K. Kalita,et al.  Efficient handling of high-dimensional feature spaces by randomized classifier ensembles , 2002, KDD.

[16]  Zhi-Hua Zhou,et al.  On the Margin Explanation of Boosting Algorithms , 2008, COLT.

[17]  Issam Dagher,et al.  Face recognition using IPCA-ICA algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[19]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Kai Li,et al.  Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces , 2008, SIGIR '08.

[22]  Deng Cai,et al.  Orthogonal locality preserving indexing , 2005, SIGIR '05.

[23]  Tuo Zhao,et al.  Feature selection for linear support vector machines , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[25]  Ye Xu,et al.  To obtain orthogonal feature extraction using training data selection , 2009, CIKM.

[26]  Yang Song,et al.  Boosting the Feature Space: Text Classification for Unstructured Data on the Web , 2006, Sixth International Conference on Data Mining (ICDM'06).

[27]  Man-Duen Choi TRICKS OR TREATS WITH THE HILBERT MATRIX , 1983 .

[28]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[29]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[30]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[31]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[32]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[33]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[34]  Thomas Hofmann,et al.  Denoising and Dimension Reduction in Feature Space , 2007 .

[35]  Bernhard Schölkopf,et al.  Sparse Kernel Feature Analysis , 2002 .

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[37]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[39]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[40]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[41]  James M. Rehg,et al.  Fast Asymmetric Learning for Cascade Face Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.