Experimental study of recognition rate in statistical pattern classification based on finite size of design sample set

When two classes of patterns follow the multidimensional normal distribution, the following procedure is considered. The same number of finite samples is extracted from each set of patterns. The covariance matrices and the average vectors are estimated. Using the estimated parameters, the linear discriminant function or the quadratic discriminant function is assessed by testing how correctly other samples extracted from the same set can be recognized. The authors found a method in which the above method is iterated to examine the deterioration of the recognition ratio due to the finiteness of the learning samples and to calculate the average recognition ratio. Next, the relations among the dimension of the feature patterns, the number of learning samples and the average recognition ratio are examined and compared to the expression for the approximate recognition ratio (theoretical formula) by Raudys and Fukunaga et al. The limit of application of the evaluation formula is indicated. The deterioration of the recognition ratio is examined, and it is shown that the error ratio is higher in the Bayes decision, and the error ratio becomes closer to the -distribution when the dimension of the feature vector is increased.

[1]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Thomas F. Krile,et al.  Calculation of Bayes' Recognition Error for Two Multivariate Gaussian Distributions , 1969, IEEE Transactions on Computers.

[4]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[6]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[8]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.