A mixed integer optimisation model for data classification

In this work, a mixed integer linear programming (MILP) model is proposed for the multi-class data classification problem using a hyper-box representation. The latter representation is particularly suitable for capturing disjoint data regions. The objective function used is the minimisation of the total number of misclassified data samples. In order to improve the training and testing accuracy of our approach, an iterative solution procedure is developed to assign potential multiple boxes to each single class. Finally, the applicability of the proposed approach is demonstrated through a number of illustrative examples. According to the computational results obtained, the proposed optimisation-based approach is competitive in terms of prediction accuracy when compared with various standard classifiers.

[1]  Metin Turkay,et al.  A mixed-integer programming approach to multi-class data classification problem , 2006, Eur. J. Oper. Res..

[2]  Ned Freed,et al.  EVALUATING ALTERNATIVE LINEAR PROGRAMMING MODELS TO SOLVE THE TWO-GROUP DISCRIMINANT PROBLEM , 1986 .

[3]  Yasutoshi Yajima,et al.  Linear programming approaches for multicategory support vector machines , 2005, Eur. J. Oper. Res..

[4]  John J. Glen,et al.  A comparison of standard and two-stage mathematical programming discriminant analysis methods , 2006, Eur. J. Oper. Res..

[5]  David Kendrick,et al.  GAMS, a user's guide , 1988, SGNM.

[6]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[7]  Metin Turkay,et al.  Prediction of folding type of proteins using mixed-integer linear programming , 2005 .

[8]  Laurentiu A. Tarca,et al.  Designing supervised classifiers for multiphase flow data classification , 2004 .

[9]  Steven Walczak,et al.  Knowledge discovery techniques for predicting country investment risk , 2002 .

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Richard G. Mathieu,et al.  A rule induction approach for determining the number of kanbans in a just-in-time production system , 1998 .

[12]  Antonie Stam,et al.  A comparison of a robust mixed-integer approach to existing methods for establishing classification rules for the discriminant problem , 1990 .

[13]  John Glen,et al.  Integer programming methods for normalisation and variable selection in mathematical programming discriminant analysis models , 1999, J. Oper. Res. Soc..

[14]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[15]  Constantin Zopounidis,et al.  Multi-group discrimination using multi-criteria analysis: Illustrations from the field of finance , 2002, Eur. J. Oper. Res..

[16]  Cliff T. Ragsdale,et al.  Combining Neural Networks and Statistical Predictions to Solve the Classification Problem in Discriminant Analysis , 1995 .

[17]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[18]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[19]  R. Nath,et al.  A Variable Selection Criterion in the Linear Programming Approaches to Discriminant Analysis , 1988 .

[20]  David A. Patterson,et al.  Constrained discriminant analysis via 0/1 mixed integer programming , 1997, Ann. Oper. Res..

[21]  Patrick K. Simpson,et al.  Fuzzy min-max neural networks. I. Classification , 1992, IEEE Trans. Neural Networks.

[22]  Fred Glover,et al.  IMPROVED LINEAR PROGRAMMING MODELS FOR DISCRIMINANT ANALYSIS , 1990 .

[23]  Toshiyuki Sueyoshi,et al.  Extended DEA-Discriminant Analysis , 2001, Eur. J. Oper. Res..

[24]  Bharat A. Jain,et al.  Artificial Neural Network Models for Pricing Initial Public Offerings , 1995 .

[25]  Jane W. Moy,et al.  Improved Linear Programming Formulations for the Multi-group Discriminant Problem , 1996 .

[26]  Lazaros G. Papageorgiou,et al.  Continuous-Domain Mathematical Models for Optimal Process Plant Layout , 1998 .

[27]  Fred Glover,et al.  Applications and Implementation , 1981 .

[28]  Toshiyuki Sueyoshi,et al.  Mixed integer programming approach of extended DEA-discriminant analysis , 2004, Eur. J. Oper. Res..

[29]  Arthur K. Kordon,et al.  Fault diagnosis based on Fisher discriminant analysis and support vector machines , 2004, Comput. Chem. Eng..

[30]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Protein cellular localization prediction with Support Vector Machines and Decision Trees , 2007, Comput. Biol. Medicine.

[31]  Cliff T. Ragsdale,et al.  On the classification gap in mathematical programming-based approaches to the discriminant problem , 1992 .

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[33]  Toshiyuki Sueyoshi,et al.  DEA-Discriminant Analysis: Methodological comparison among eight discriminant analysis approaches , 2006, Eur. J. Oper. Res..

[34]  Kim Fung Lam,et al.  Combining discriminant methods in solving classification problems in two-group discriminant analysis , 2002, Eur. J. Oper. Res..

[35]  Voratas Kachitvichyanukul,et al.  Detecting patterns in process data with fractal dimension , 2003, Comput. Ind. Eng..

[36]  Andrzej Bargiela,et al.  General fuzzy min-max neural network for clustering and classification , 2000, IEEE Trans. Neural Networks Learn. Syst..

[38]  Hasan Bal,et al.  An experimental comparison of the new goal programming and the linear programming approaches in the two-group discriminant problems , 2006, Comput. Ind. Eng..

[39]  Jiann-Ming Wu,et al.  Natural Discriminant Analysis Using Interactive Potts Models , 2002, Neural Computation.

[40]  Theodore B. Trafalis,et al.  Two-phase flow regime identification with a multiclassification support vector machine (SVM) model , 2005 .

[41]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[42]  F. Glover,et al.  Simple but powerful goal programming models for discriminant problems , 1981 .

[43]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[44]  Deba Prasad Mandal,et al.  Partitioning of feature space for pattern classification , 1997, Pattern Recognit..

[45]  Toshiyuki Sueyoshi,et al.  DEA-discriminant analysis in the view of goal programming , 1999, Eur. J. Oper. Res..

[46]  J. J. Glen,et al.  Mathematical programming models for piecewise-linear discriminant analysis , 2005, J. Oper. Res. Soc..

[47]  Hong Seo Ryoo,et al.  Pattern classification by concurrently determined piecewise linear and convex discriminant functions , 2006, Comput. Ind. Eng..

[48]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[49]  Ruey-Shiang Guh,et al.  A hybrid learning-based model for on-line detection and analysis of control chart patterns , 2005, Comput. Ind. Eng..

[50]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[51]  P. K. Simpson,et al.  Fuzzy min-max neural networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[52]  W. Gehrlein General mathematical programming formulations for the statistical classification problem , 1986 .

[53]  Linda Kaufman,et al.  Solving the quadratic programming problem arising in support vector classification , 1999 .

[54]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[55]  Daijin Kim,et al.  Human Face Detection in Digital Video Using SVMEnsemble , 2003, Neural Processing Letters.

[56]  David A. Koonce,et al.  A data mining tool for learning from manufacturing systems , 1997 .

[57]  Hyun Joon Shin,et al.  One-class support vector machines - an application in machine fault detection and classification , 2005, Comput. Ind. Eng..

[58]  S. M. Bajgier,et al.  AN EXPERIMENTAL COMPARISON OF STATISTICAL AND LINEAR PROGRAMMING APPROACHES TO THE DISCRIMINANT PROBLEM , 1982 .

[59]  Willy Gochet,et al.  Mathematical programming based heuristics for improving LP-generated classifiers for the multiclass supervised classification problem , 2006, Eur. J. Oper. Res..

[60]  John Glen,et al.  Classification accuracy in discriminant analysis: a mixed integer programming approach , 2001, J. Oper. Res. Soc..

[61]  Kim Fung Lam,et al.  An experimental comparison of some recently developed linear programming approaches to the discriminant problem , 1997, Comput. Oper. Res..

[62]  John M. Wilson,et al.  Integer programming formulations of statistical classification problems , 1996 .