A comparison of discriminant procedures for binary variables

Abstract Thirteen discriminant procedures are compared by applying them to five real sets of binary data and evaluating their leave-one-out error rates. Three versions of each data set have been used, containing respectively “large”, “moderate” and “small” numbers of variables. To achieve the latter two categories, reduction of variables was first carried out using the all-subsets approach based on Kullback's information divergence measure. Sample size, number of non-empty multinomial cells and Empirical Integrated Rank are taken into account in assessment of classifier effectiveness. While the data sets are ones that arose during day-to-day statistical consulting, the empirical basis for drawing widespread conclusions is inevitably limited. Nevertheless, the study did highlight the following interesting features. The Kernel, Fourier and Hall's k -nearest neighbour classifiers had a tendency to overfit the data. The mixed integer programming classifier was clearly better than the other linear classifiers, and linear discriminant analysis had better results than logistic discrimination especially for small sample sizes. The second-order Bahadur procedure was generally very effective when the number of variables was large, but only if the sample size was large when the number of variables was small. The second-order log-linear models were very effective when the number of variables was small or when the sample sizes were large. Quadratic discrimination and Hills’ k -nearest neighbour classification both performed poorly. The traditional statistical classifiers did not cope well with sparse binary data; the non-traditional classifiers such as neural networks or mixed integer programming classifiers were much better in such circumstances.

[1]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[2]  Antonie Stam,et al.  Nontraditional approaches to statistical classification: Some perspectives on L_p-norm methods , 1997, Ann. Oper. Res..

[3]  E A Joachimsthaler,et al.  Mathematical Programming Approaches for the Classification Problem in Two-Group Discriminant Analysis. , 1990, Multivariate behavioral research.

[4]  Paul A. Rubin,et al.  Solving mixed integer classification problems by decomposition , 1997, Ann. Oper. Res..

[5]  Antonie Stam,et al.  A mixed integer programming algorithm for minimizing the training sample misclassification cost in two-group classification , 1997, Ann. Oper. Res..

[6]  Philip E. Gill,et al.  Practical optimization , 1981 .

[7]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[8]  Day Ne,et al.  A GENERAL MAXIMUM LIKELIHOOD DISCRIMINANT , 1967 .

[9]  J. Anderson Separate sample logistic discrimination , 1972 .

[10]  Antonie Stam,et al.  Mathematical programming formulations for two-group classification with binary variables , 1997, Ann. Oper. Res..

[11]  Tzay Y. Young,et al.  Statistical Pattern Classification with Binary Variables , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  J. Neyman,et al.  Research Papers in Statistics. Festschrift for J. Neyman F.N. David editor, assisted by E. Fix. London, New York, Sydney, J. Wiley & Sons, 1966, VIII p. 468 p., 105/–. , 1968, Recherches économiques de Louvain.

[13]  Antonie Stam,et al.  FOUR APPROACHES TO THE CLASSIFICATION PROBLEM IN DISCRIMINANT ANALYSIS: AN EXPERIMENTAL STUDY* , 1988 .

[14]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[15]  J. Arthur Woodward,et al.  Discriminant Analysis with Categorical Data , 1977 .

[16]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[17]  D. Hand A comparison of two methods of discriminant analysis applied to binary data. , 1983, Biometrics.

[18]  Gary J. Koehler,et al.  Minimizing Misclassifications in Linear Discriminant Analysis , 1990 .

[19]  D. Moore Evaluation of Five Discrimination Procedures for Binary Variables , 1973 .

[20]  D. W. Roncek,et al.  Discrete Discriminant Analysis. , 1979 .

[21]  W. Krzanowski The Performance of Fisher's Linear Discriminant Function Under Non-Optimal Conditions , 1977 .

[22]  Wojtek J. Krzanowski,et al.  ON SELECTING VARIABLES AND ASSESSING THEIR PERFORMANCE IN LINEAR DISCRIMINANT ANALYSIS , 1989 .

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  M. Hills,et al.  Discrimination and Allocation with Discrete Data , 1967 .

[25]  Peter Hall,et al.  Optimal near neighbour estimator for use in discriminant analysis , 1981 .

[26]  William R. Dillon,et al.  On the Performance of Some Multinomial Classification Rules , 1978 .

[27]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[28]  Paul A. Rubin,et al.  Heuristic solution procedures for a mixed‐integer programming discriminant model , 1990 .

[29]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[30]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[31]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[32]  R. Kronmal,et al.  Some Classification Procedures for Multivariate Binary Data Using Orthogonal Functions , 1976 .

[33]  Wojtek J. Krzanowski,et al.  Principles of multivariate analysis : a user's perspective. oxford , 1988 .

[34]  P. Hall On nonparametric multivariate binary discrimination , 1981 .

[35]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[36]  E. S. Gilbert On Discrimination Using Qualitative Variables , 1968 .

[37]  C A Kulikowski,et al.  A comparison of methods for the automated diagnosis of thyroid dysfunction. , 1971, Computers and biomedical research, an international journal.

[38]  Monica A. Walker,et al.  Studies in Item Analysis and Prediction. , 1962 .

[39]  Josiah Macy,et al.  Mathematics and Computer Science in Biology and Medicine , 1966 .

[40]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[41]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[42]  G. McCabe Computations for Variable Selection in Discriminant Analysis , 1975 .

[43]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[44]  D. A. Franklin,et al.  CONSTRUCTION OF A MODEL FOR COMPUTER-ASSISTED DIAGNOSIS: APPLICATION TO THE PROBLEM OF NON-TOXIC GOITRE , 1966 .

[45]  Wojtek J. Krzanowski,et al.  Error-rate estimation in two-group discriminant analysis using the linear discriminant function , 1990 .

[46]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[47]  C. A. Smith Some examples of discrimination. , 1947, Annals of eugenics.