Classification and discrimination problems with applications, part I

In Part I exact results for univariate (“p= 1”) two-group (“k = 2”) classification problems were derived assuming normality and equality of the variances. In Part IIa asymptotic results for multivariate (“p> I”) two-group classification and discrimination problems are based on the corresponding assumptions of multivariate normality and equality of the covariance matrices. The results (4.6.5), (4.6.6) and (4.6.7) are believed to be new. The asymptotic results in Section 4.6, together with results presented elsewhere in the literature, constitute the basis of various detailed proposals to deal with problems from actual statistical practice. Most of these proposals are modifications or specifications of existing ones. We shall pay some attention to (I) testing whether differences exist. But we are mainly interested in: (II) constructing a discriminant function, (III) assigning the individual under classification, and in (IV) constructing a confidence interval for “the” posterior probability that the individual under classification belongs to Population 2. An important part in our theory is played by various techniques for selecting variables in discriminant analysis. The need for such techniques follows from Section 4.10. The consequences of building-in a selection technique are discussed in Section 4.12. One of our proposals motivates the theory presented in Chapter 3 and is mentioned here for that reason: employ a large part of the data, say 70%, in order to construct a discriminant function (via a selection of variables); by applying this function to the rest of the data, the exact univariate theory of Part I becomes of application. Part IIb will contain a chapter on applications.

[1]  P. Hsu,et al.  Notes on Hotelling's Generalized $T$ , 1938 .

[2]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[3]  R. Fisher THE PRECISION OF DISCRIMINANT FUNCTIONS , 1940 .

[4]  A. Wald On a Statistical Problem Arising in the Classification of an Individual into One of Two Groups , 1944 .

[5]  T. W. Anderson Classification by multivariate analysis , 1951 .

[6]  A. Bowker,et al.  AN ASYMPTOTIC EXPANSION FOR THE DISTRIBUTION FUNCTION OF THE CLASSIFICATION STATISTIC W , 1959 .

[7]  A. Kudô THE CLASSIFICATORY PROBLEM VIEWED AS A TWO-DECISION PROBLEM , 1959 .

[8]  S. Gupta OPTIMUM CLASSIFICATION RULES FOR CLASSIFICATION INTO TWO MULTIVARIATE NORMAL POPULATIONS , 1965 .

[9]  M. Hills Allocation Rules and Their Error Rates , 1966 .

[10]  O. J. Dunn,et al.  Elimination of variates in linear discrimination problems. , 1966, Biometrics.

[11]  P. Lachenbruch An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. , 1967, Biometrics.

[12]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[13]  P. Lachenbruch On Expected Probabilities of Misclassification in Discriminant Analysis, Necessary Sample Size, and a Relation with the Multiple Correlation Coefficient , 1968 .

[14]  V. Urbakh,et al.  Linear Discriminant Analysis: Loss of Discriminating Power When a Variate is Omitted , 1971 .

[15]  Masashi Okamoto,et al.  Asymptotic expansion of the distribution of the Z statistic in discriminant analysis , 1971 .

[16]  W. Schaafsma Testing statistical hypotheses concerning the expectations of two independent normals, both with variance one. II , 1971 .

[17]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.

[18]  T. W. Anderson Asymptotic Evaluation of the Probabilities of Misclassification by Linear Discriminant Functions , 1973 .

[19]  M. Sorum Estimating the Expected Probability of Misclassification for a Rule Based on the Linear Discriminant Function: Univariate Normal Case , 1973 .

[20]  J. A. Anderson,et al.  LOGISTIC DISCRIMINATION WITH MEDICAL APPLICATIONS , 1973 .

[21]  W. Schaafsma CLASSIFYING WHEN POPULATIONS ARE ESTIMATED , 1973 .

[22]  Laveen N. Kanal,et al.  Patterns in pattern recognition: 1968-1974 , 1974, IEEE Trans. Inf. Theory.

[23]  Michael D. Perlman,et al.  Power of the Noncentral F-Test: Effect of Additional Variates on Hotelling's T2-Test , 1974 .

[24]  Anil K. Jain,et al.  Independence, Measurement Complexity, and Classification Performance , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  G. McCabe Computations for Variable Selection in Discriminant Analysis , 1975 .

[26]  T. Sjøvold Some notes on the distribution and certain modifications of Mahalanobis' generalized distance (D2)☆ , 1975 .

[27]  O. J. Dunn,et al.  Cost evaluation of a two-stage classification procedure. , 1975, Biometrics.

[28]  G. N. Vark A Critical Evaluation of the Application of Multivariate Statistical Methods to the study of human populations from their Skeletal Remains , 1976 .

[29]  R. R. Hocking The analysis and selection of variables in linear regression , 1976 .

[30]  Geoffrey J. McLachlan,et al.  Criterion for Selecting Variables for Linear Discriminant Function , 1976 .

[31]  M. Thompson Selection of Variables in Multiple Regression: Part II. Chosen Procedures, Computations and Examples , 1978 .