On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition

This paper compares four classification algorithms-discriminant functions when classifying individuals into two multivariate populations. The discriminant functions (DF's) compared are derived according to the Bayes rule for normal populations and differ in assumptions on the covariance matrices' structure. Analytical formulas for the expected probability of misclassification EPN are derived and show that the classification error EPN depends on the structure of a classification algorithm, asymptotic probability of misclassification P¿, and the ratio of learning sample size N to dimensionality p:N/p for all linear DF's discussed and N2/p for quadratic DF's. The tables for learning quantity H = EPN/P¿ depending on parameters P¿, N, and p for four classifilcation algorithms analyzed are presented and may be used for estimating the necessary learning sample size, detennining the optimal number of features, and choosing the type of the classification algorithm in the case of a limited learning sample size.

[1]  J. Imhof Computing the distribution of quadratic forms in normal variables , 1961 .

[2]  S. John Errors in Discrimination , 1961 .

[3]  M. Okamoto An Asymptotic Expansion for the Distribution of the Linear Discriminant Function , 1963 .

[4]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[5]  P. Lachenbruch On Expected Probabilities of Misclassification in Discriminant Analysis, Necessary Sample Size, and a Relation with the Multiple Correlation Coefficient , 1968 .

[6]  F. Rohlf Adaptive Hierarchical Clustering Schemes , 1970 .

[7]  G Gallus,et al.  Improved computer chromosome analysis incorporating preprocessing and boundary analysis , 1970, Physics in medicine and biology.

[8]  Fred W. Smith,et al.  Small-sample optimality of design techniques for linear classifiers of Gaussian patterns , 1972, IEEE Trans. Inf. Theory.

[9]  Sidney Marks,et al.  Discriminant Functions When Covariance Matrices are Unequal , 1974 .

[10]  H. Pipberger Computer analysis of electrocardiograms. , 1975, Cardiovascular clinics.

[11]  M. A. Moran On the expectation of errors of allocation associated with a linear discriminant function , 1975 .

[12]  J. V. Ness,et al.  On the Effects of Dimension in Discriminant Analysis , 1976 .

[13]  Anil K. Jain,et al.  On the optimal number of features in the classification of multivariate Gaussian data , 1978, Pattern Recognit..

[14]  Robert M. Haralick,et al.  Decomposition of Two-Dimensional Shapes by Graph-Theoretic Clustering , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.