Multi-Class Feature Selection with Support Vector Machines

Feature selection is an important component of text categorization that has mostly been addressed by filter methods. In the context of SVM classifiers we propose in this paper a general framework where feature selection is embedded as a part of the learning algorithms. As special cases of this framework we derive two algorithms, one based on the L1 regularization of the weight vector and the other being an extension of the Recursive Feature Elimination algorithm. We also derive a new method that performs even better than these two methods. Our framework and methods are developed in a multi-class setting where the goal is to find a small set of features for all the classes simultaneously. On some datasets the size of the feature set found by our new method is one order of magnitude smaller than the one found by a filter method based on information gain. Finally, we devise a computationally efficient optimization technique for this method.

[1]  R. Fletcher Practical Methods of Optimization , 1988 .

[2]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[8]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[9]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[10]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[11]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[14]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[15]  George Forman,et al.  A pitfall and solution in multi-class feature selection for text classification , 2004, ICML.

[16]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[17]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[18]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[19]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[20]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[21]  Shlomo Argamon,et al.  Author Identification on the Large Scale , 2005 .

[22]  Xiang-Yan Zeng,et al.  Multi-class feature selection for texture classification , 2006, Pattern Recognit. Lett..

[23]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[26]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[27]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[28]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[29]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[30]  S. Sathiya Keerthi,et al.  A Fast Tracking Algorithm for Generalized LARS/LASSO , 2007, IEEE Transactions on Neural Networks.

[31]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .