Examining the effect of second-order terms in mathematical programming approaches to the classification problem

Abstract Research on mathematical programming approaches to the classification problem has focused almost exclusively on linear discriminant functions with only first-order terms. While many of these first-order models have displayed excellent classificatory performance when compared to Fisher's linear discriminant method, they cannot compete with Smith's quadratic discriminant method on certain data sets. In this paper, we investigate the appropriateness of including second-order terms in mathematical programming models. Various issues are addressed, such as performance of models with small to moderate sample size, need for crossproduct terms, and loss of power by the mathematical programming models under conditions ideal for the parametric procedures. A simulation study is conducted to assess the relative performance of first-order and second-order mathematical programming models to the parametric procedures. The simulation study indicates that mathematical programming models using polynomial functions may be prone to overfitting on the training samples which in turn may cause rather poor fits on the validation samples. The simulation study also indicates that inclusion of cross-product terms may hurt a polynomial model's accuracy on the validation samples, although omission of them means that the model is not invariant to nonsingular transformations of the data.

[1]  R. Nath,et al.  A Variable Selection Criterion in the Linear Programming Approaches to Discriminant Analysis , 1988 .

[2]  Fred Glover,et al.  A NEW CLASS OF MODELS FOR THE DISCRIMINANT PROBLEM , 1988 .

[3]  Antonie Stam,et al.  Second order mathematical programming formulations for discriminant analysis , 1994 .

[4]  Edward P. Markowski,et al.  SOME DIFFICULTIES AND IMPROVEMENTS IN APPLYING LINEAR PROGRAMMING FORMULATIONS TO THE DISCRIMINANT PROBLEM , 1985 .

[5]  Gary J. Koehler,et al.  Unacceptable Solutions and the Hybrid Discriminant Model , 1989 .

[6]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[7]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[8]  Fred Glover,et al.  Applications and Implementation , 1981 .

[9]  Paul A. Rubin,et al.  Separation Failure in Linear Programming Discriminant Models , 1991 .

[10]  C. A. Smith Some examples of discrimination. , 1947, Annals of eugenics.

[11]  Fred Glover,et al.  IMPROVED LINEAR PROGRAMMING MODELS FOR DISCRIMINANT ANALYSIS , 1990 .

[12]  Mo Adam Mahmood,et al.  A PREFORMANCE ANALYSIS OF PARAMETRIC AND NONPARAMETRIC DISCRIMINANT APPROACHES TO BUSINESS DECISION MAKING , 1987 .

[13]  P. Rubin A Comparison of Linear Programming and Parametric Approaches to the Two‐Group Discriminant Problem* , 1990 .

[14]  Cliff T. Ragsdale,et al.  On the classification gap in mathematical programming-based approaches to the discriminant problem , 1992 .

[15]  Gary J. Koehler,et al.  Characterization of Unacceptable Solutions in LP Discriminant Analysis , 1989 .

[16]  Prakash L. Abad,et al.  On the performance of linear programming heuristics applied on a quadratic transformation in the classification problem , 1994 .

[17]  Cliff T. Ragsdale,et al.  Mathematical Programming Formulations for the Discriminant Problem: An Old Dog Does New Tricks* , 1991 .

[18]  A. Stam,et al.  Classification performance of mathematical programming techniques in discriminant analysis: Results for small and medium sample sizes , 1990 .

[19]  W. Gehrlein General mathematical programming formulations for the statistical classification problem , 1986 .

[20]  F. Glover,et al.  Notes and Communications RESOLVING CERTAIN DIFFICULTIES AND IMPROVING THE CLASSIFICATION POWER OF LP DISCRIMINANT ANALYSIS FORMULATIONS , 1986 .

[21]  Paul A. Rubin,et al.  Evaluating the maximize minimum distance formulation of the linear discriminant problem , 1989 .

[22]  Paul A. Rubin,et al.  A comment regarding polynomial discriminant functions , 1994 .

[23]  Gary J. Koehler,et al.  Minimizing Misclassifications in Linear Discriminant Analysis , 1990 .

[24]  Paul A. Rubin,et al.  Heuristic solution procedures for a mixed‐integer programming discriminant model , 1990 .

[25]  William J. Banks,et al.  An Efficient Optimal Solution Algorithm for the Classification Problem , 1991 .

[26]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[27]  Antonie Stam,et al.  FOUR APPROACHES TO THE CLASSIFICATION PROBLEM IN DISCRIMINANT ANALYSIS: AN EXPERIMENTAL STUDY* , 1988 .