Feature subset selection for logistic regression via mixed integer optimization

This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-of-fit measure. The purpose of our work is to establish a computational framework for selecting a subset of features with an optimality guarantee. For this purpose, we devise mixed integer optimization formulations for feature subset selection in logistic regression. Specifically, we pose the problem as a mixed integer linear optimization problem, which can be solved with standard mixed integer optimization software, by making a piecewise linear approximation of the logistic loss function. The computational results demonstrate that when the number of candidate features was less than 40, our method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Furthermore, even if there were more candidate features, our method often found a better subset of features than the stepwise methods did in terms of information criteria.

[1]  M. Fireman,et al.  MULTIPLE REGRESSION ANALYSIS OF SOIL DATA , 1954 .

[2]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[3]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[4]  C. L. Mallows Some comments on C_p , 1973 .

[5]  H. Akaike A new look at the statistical model identification , 1974 .

[6]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Yadolah Dodge,et al.  Mathematical Programming In Statistics , 1981 .

[9]  C. J. Huberty,et al.  Issues in the use and interpretation of discriminant analysis , 1984 .

[10]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[11]  Baozong Yuan,et al.  A more efficient branch and bound algorithm for feature selection , 1993, Pattern Recognit..

[12]  C. Mallows More comments on C p , 1995 .

[13]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[14]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[15]  C. H. Oh,et al.  Some comments on , 1998 .

[16]  E. George The Variable Selection Problem , 2000 .

[17]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[18]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[19]  Xue-wen Chen An improved branch and bound algorithm for feature selection , 2003, Pattern Recognit. Lett..

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  R. Stolzenberg,et al.  Multiple Regression Analysis , 2004 .

[22]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[24]  Hiroshi Konno,et al.  A MEAN-VARIANCE-SKEWNESS MODEL: ALGORITHM AND APPLICATIONS , 2005 .

[25]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[26]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[27]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[28]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2007 .

[29]  David Casasent,et al.  Adaptive branch and bound algorithm for selecting optimal features , 2007, Pattern Recognit. Lett..

[30]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[31]  S. Ulbrich,et al.  MIXED INTEGER SECOND ORDER CONE PROGRAMMING , 2008 .

[32]  Joaquín A. Pacheco,et al.  A variable selection method based on Tabu search for logistic regression models , 2009, Eur. J. Oper. Res..

[33]  Silvia Casado Yusta,et al.  Different metaheuristic strategies to solve the feature selection problem , 2009, Pattern Recognit. Lett..

[34]  Hiroshi Konno,et al.  Choosing the best set of variables in regression analysis using integer programming , 2009, J. Glob. Optim..

[35]  Dimitris Bertsimas,et al.  Algorithm for cardinality-constrained quadratic optimization , 2009, Comput. Optim. Appl..

[36]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[37]  Hiroshi Konno,et al.  Multi-step methods for choosing the best set of variables in regression analysis , 2010, Comput. Optim. Appl..

[38]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[39]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[40]  Ryuhei Miyashiro,et al.  Subset selection by Mallows' C p , 2015 .

[41]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[42]  Ryuhei Miyashiro,et al.  Mixed integer second-order cone programming formulations for variable selection in linear regression , 2015, Eur. J. Oper. Res..

[43]  Ryuhei Miyashiro,et al.  Subset selection by Mallows' Cp: A mixed integer programming approach , 2015, Expert Syst. Appl..

[44]  Toshiki Sato,et al.  Piecewise-Linear Approximation for Feature Subset Selection in a Sequential Logit Model , 2015, ArXiv.