Integer programming models for feature selection: New extensions and a randomized solution algorithm

Feature selection methods are used in machine learning and data analysis to select a subset of features that may be successfully used in the construction of a model for the data. These methods are applied under the assumption that often many of the available features are redundant for the purpose of the analysis. In this paper, we focus on a particular method for feature selection in supervised learning problems, based on a linear programming model with integer variables. For the solution of the optimization problem associated with this approach, we propose a novel robust metaheuristics algorithm that relies on a Greedy Randomized Adaptive Search Procedure, extended with the adoption of short memory and a local search strategy. The performances of our heuristic algorithm are successfully compared with those of well-established feature selection methods, both on simulated and real data from biological applications. The obtained results suggest that our method is particularly suited for problems with a very large number of binary or categorical features.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Richard Weber,et al.  Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[5]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  G. Vayopoulos,et al.  Antiretroviral activity of 5-azacytidine during treatment of a HTLV-1 positive myelodysplastic syndrome with autoimmune manifestations , 2012, Virology Journal.

[8]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[9]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[10]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[11]  Giovanni Felici,et al.  Human polyomaviruses identification by logic mining techniques , 2012, Virology Journal.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[14]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[15]  Jose Miguel Puerta,et al.  A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets , 2011, Pattern Recognit. Lett..

[16]  P. Bertolazzi,et al.  BLOG 2.0: a software system for character‐based species classification with DNA Barcode sequences. What it does, how to use it , 2013, Molecular ecology resources.

[17]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[18]  Selwyn Piramuthu Evaluating feature selection methods for learning in data mining applications , 2004, Eur. J. Oper. Res..

[19]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  T. John Peter,et al.  Study and Development of Novel Feature Selection Framework for Heart Disease Prediction , 2012 .

[22]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[23]  Toshihide Ibaraki,et al.  Logical Analysis of Binary Data with Missing Bits , 1999, Artif. Intell..

[24]  Paul A. Rubin,et al.  Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[27]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[28]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[29]  Paul A. Rubin,et al.  Heuristic solution procedures for a mixed‐integer programming discriminant model , 1990 .

[30]  Giovanni Felici,et al.  Logic classification and feature selection for biomedical data , 2008, Comput. Math. Appl..

[31]  I. Halil Kavakli,et al.  Optimization Based Tumor Classification from Microarray Gene Expression Data , 2011, PloS one.

[32]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[33]  Emilio Carrizosa,et al.  Multi-group support vector machines with measurement costs: A biobjective approach , 2008, Discret. Appl. Math..

[34]  Jacob Zahavi,et al.  Using simulated annealing to optimize the feature selection problem in marketing applications , 2006, Eur. J. Oper. Res..

[35]  Mauricio G. C. Resende,et al.  GRASP: basic components and enhancements , 2011, Telecommun. Syst..

[36]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[37]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[38]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[39]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[40]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[41]  Celso C. Ribeiro,et al.  Greedy Randomized Adaptive Search Procedures , 2003, Handbook of Metaheuristics.

[42]  Mauricio G. C. Resende,et al.  Grasp: An Annotated Bibliography , 2002 .

[43]  Slobodan Vucetic,et al.  Multi-task feature selection in microarray data by binary integer programming , 2013, BMC Proceedings.

[44]  M. Resende,et al.  A probabilistic heuristic for a computationally difficult set covering problem , 1989 .