Max-Margin feature selection

Many machine learning applications such as in vision, biology and social networking deal with data in high dimensions. Feature selection is typically employed to select a subset of features which im- proves generalization accuracy as well as reduces the computational cost of learning the model. One of the criteria used for feature selection is to jointly minimize the redundancy and maximize the rele- vance of the selected features. In this paper, we formulate the task of feature selection as a one class SVM problem in a space where features correspond to the data points and instances correspond to the dimensions. The goal is to look for a representative subset of the features (support vectors) which describes the boundary for the region where the set of the features (data points) exists. This leads to a joint optimization of relevance and redundancy in a principled max-margin framework. Additionally, our formulation enables us to leverage existing techniques for optimizing the SVM objective resulting in highly computationally efficient solutions for the task of feature selection. Specifically, we employ the dual coordinate descent algorithm (Hsieh et al., 2008), originally proposed for SVMs, for our formulation. We use a sparse representation to deal with data in very high dimensions. Experiments on seven publicly available benchmark datasets from a variety of domains show that our approach results in orders of magnitude faster solutions even while retaining the same level of accuracy compared to the state of the art feature selection techniques.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Yamuna Prasad,et al.  Feature Selection using One Class SVM : A New Perspective , 2013 .

[3]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[4]  Tony Jebara,et al.  Relative Margin Machines , 2008, NIPS.

[5]  Christos Faloutsos,et al.  A Max Margin Framework on Image Annotation and Multimodal Image Retrieval , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[8]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[9]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[10]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[11]  Sébastien Gadat Feature Selection in High Dimension , 2011 .

[12]  T. Aruldoss Albert Victoire,et al.  Design of fuzzy expert system for microarray data classification using a novel Genetic Swarm Algorithm , 2012, Expert Syst. Appl..

[13]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[14]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[15]  Jian Yang,et al.  A weighted one-class support vector machine , 2016, Neurocomputing.

[16]  Ivor W. Tsang,et al.  Discovering Support and Affiliated Features from Very High Dimensions , 2012, ICML.

[17]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[18]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..