Heuristics for feature selection in mathematical programming discriminant analysis models

In developing a classification model for assigning observations of unknown class to one of a number of specified classes using the values of a set of features associated with each observation, it is often desirable to base the classifier on a limited number of features. Mathematical programming discriminant analysis methods for developing classification models can be extended for feature selection. Classification accuracy can be used as the feature selection criterion by using a mixed integer programming (MIP) model in which a binary variable is associated with each training sample observation, but the binary variable requirements limit the size of problems to which this approach can be applied. Heuristic feature selection methods for problems with large numbers of observations are developed in this paper. These heuristic procedures, which are based on the MIP model for maximizing classification accuracy, are then applied to three credit scoring data sets.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Edward P. Markowski,et al.  SOME DIFFICULTIES AND IMPROVEMENTS IN APPLYING LINEAR PROGRAMMING FORMULATIONS TO THE DISCRIMINANT PROBLEM , 1985 .

[3]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[4]  David J. Hand,et al.  Construction of a k-nearest-neighbour credit-scoring system , 1997 .

[5]  Gary J. Koehler,et al.  Linear Discriminant Functions Determined by Genetic Search , 1991, INFORMS J. Comput..

[6]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[7]  John D. C. Little,et al.  On model building , 1993 .

[8]  John Glen,et al.  Integer programming methods for normalisation and variable selection in mathematical programming discriminant analysis models , 1999, J. Oper. Res. Soc..

[9]  Richard C. Larson,et al.  Model Building in Mathematical Programming , 1979 .

[10]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[13]  John J. Glen,et al.  A comparison of standard and two-stage mathematical programming discriminant analysis methods , 2006, Eur. J. Oper. Res..

[14]  Elizabeth Mays,et al.  Credit Scoring for Risk Managers: The Handbook for Lenders , 2003 .

[15]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[16]  Antonie Stam,et al.  A comparison of a robust mixed-integer approach to existing methods for establishing classification rules for the discriminant problem , 1990 .

[17]  Selwyn Piramuthu Feature Selection for Financial Credit-Risk Evaluation Decisions , 1999, INFORMS J. Comput..

[18]  Y. Liu,et al.  Data mining feature selection for credit scoring models , 2005, J. Oper. Res. Soc..

[19]  Cliff T. Ragsdale,et al.  On the classification gap in mathematical programming-based approaches to the discriminant problem , 1992 .

[20]  F. Glover,et al.  Simple but powerful goal programming models for discriminant problems , 1981 .

[21]  Antonie Stam,et al.  Extensions of mathematical programming-based classification rules: A multicriteria approach , 1990 .

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Minghe Sun,et al.  A Mathematical Programming Approach for Gene Selection and Tissue Classification , 2003, Bioinform..

[24]  Antonie Stam,et al.  Nontraditional approaches to statistical classification: Some perspectives on L_p-norm methods , 1997, Ann. Oper. Res..

[25]  John Glen,et al.  Classification accuracy in discriminant analysis: a mixed integer programming approach , 2001, J. Oper. Res. Soc..

[26]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[27]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[28]  C. Heckler Applied Discriminant Analysis , 1995 .

[29]  R. Nath,et al.  A Variable Selection Criterion in the Linear Programming Approaches to Discriminant Analysis , 1988 .

[30]  Bart Baesens,et al.  Filter‐ versus wrapper‐based feature selection for credit scoring , 2005, Int. J. Intell. Syst..

[31]  Fred Glover,et al.  Applications and Implementation , 1981 .

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[33]  John J. Glen,et al.  An iterative mixed integer programming method for classification accuracy maximizing discriminant analysis , 2003, Comput. Oper. Res..

[34]  W. Gehrlein General mathematical programming formulations for the statistical classification problem , 1986 .