High dimensional data classification and feature selection using support vector machines

In many big-data systems, large amounts of information are recorded and stored for analytics purposes. Often however, this vast amount of information does not offer additional benefits for optimal decision making, but may rather be complicating and too costly for collection, storage, and processing. For instance, tumor classification using high-throughput microarray data is challenging due to the presence of a large number of noisy features that do not contribute to the reduction of classification errors. For such problems, the general aim is to find a limited number of genes that highly differentiate among the classes. Thus in this paper, we address a specific class of machine learning, namely the problem of feature selection within support vector machine classification that deals with finding an accurate binary classifier that uses a minimal number of features. We introduce a new approach based on iteratively adjusting a bound on the l1-norm of the classifier vector in order to force the number of selected features to converge towards the desired maximum limit. We analyze two real-life classification problems with high dimensional features. The first case is the medical diagnosis of tumors based on microarray data where we present a generic approach for cancer classification based on gene expression. The second case deals with sentiment classification of on-line reviews from Amazon, Yelp, and IMDb. The results show that the proposed classification and feature selection approach is simple, computationally tractable, and achieves low error rates which are key for the construction of advanced decision-support systems.

[1]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[2]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[5]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[6]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[7]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[10]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[11]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[14]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[17]  Marco Sciandrone,et al.  Please Scroll down for Article Optimization Methods and Software Feature Selection Combining Linear Support Vector Machines and Concave Optimization Feature Selection Combining Linear Support Vector Machines and Concave Optimization , 2022 .

[18]  Jacob Goldenberg,et al.  Mine Your Own Business: Market-Structure Surveillance Through Text Mining , 2012, Mark. Sci..

[19]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[20]  Richard Weber,et al.  Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[23]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[24]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[25]  Lai-Wan Chan,et al.  The Minimum Error Minimax Probability Machine , 2004, J. Mach. Learn. Res..

[26]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[27]  Giovanni Felici,et al.  Integer programming models for feature selection: New extensions and a randomized solution algorithm , 2016, Eur. J. Oper. Res..

[28]  Vaithilingam Jeyakumar,et al.  Simultaneous classification and feature selection via convex quadratic programming with application to HIV-associated neurocognitive disorder assessment , 2010, Eur. J. Oper. Res..

[29]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[30]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[31]  Haldun Aytug,et al.  Feature selection for support vector machines using Generalized Benders Decomposition , 2015, Eur. J. Oper. Res..

[32]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[33]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[34]  Hon-Kwong Lui,et al.  Machine Learning for Direct Marketing Response Models: Bayesian Networks with Evolutionary Programming , 2006, Manag. Sci..