Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems

The authors discuss the effects of sample size on the feature selection and error estimation for several types of classifiers. In addition to surveying prior work in this area, they give practical advice to today's designers and users of statistical pattern recognition systems. It is pointed out that one needs a large number of training samples if a complex classification rule with many features is being utilized. In many pattern recognition problems, the number of potential features is very large and not much is known about the characteristics of the pattern classes under consideration: thus, it is difficult to determine a priori the complexity of the classification rule needed. Therefore, even when the designer believes that a large number of training samples has been selected, they may not be enough for designing and evaluating the classification problem at hand. It is further noted that a small sample size can cause many problems in the design of a pattern recognition system.<<ETX>>

[1]  J. Page Error-Rate Estimation in Discriminant Analysis , 1985 .

[2]  Minoru Siotani,et al.  3 Large sample approximations and asymptotic expansions of classification statistics , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[3]  G. McLachlan Error Rate Estimation in Discriminant Analysis: Recent Advances , 1987 .

[4]  B. Chandrasekaran,et al.  On dimensionality and sample size in statistical pattern classification , 1971, Pattern Recognit..

[5]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[7]  Terence J. O'Neill The General Distribution of the Error Rate of a Classification Procedure With Application to Logistic Regression Discrimination , 1980 .

[8]  G. S. Lbov 21 Logical functions in the problems of empirical prediction , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[9]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[10]  Anil K. Jain,et al.  ON BALANCING DECISION FUNCTIONS. , 1979 .

[11]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  King-Sun Fu,et al.  Error estimation in pattern recognition via LAlpha -distance between posterior density functions , 1976, IEEE Trans. Inf. Theory.

[13]  L. Devroye,et al.  8 Nearest neighbor methods in discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[14]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[15]  G. McLachlan The bias of the apparent error rate in discriminant analysis , 1976 .

[16]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.

[17]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[19]  Anil K. Jain,et al.  Classifier design with Parzen Windows , 1988 .

[20]  D. J. Hand,et al.  Recent advances in error rate estimation , 1986, Pattern Recognit. Lett..

[21]  T. Wagner,et al.  Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[22]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[23]  G. McLachlan ASSESSING THE PERFORMANCE OF AN ALLOCATION RULE , 1986 .

[24]  Sarunas Raudys On the accuracy of a bootstrap estimate of the classification error , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[25]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Anil K. Jain,et al.  On the optimal number of features in the classification of multivariate Gaussian data , 1978, Pattern Recognit..

[27]  Ned Glick,et al.  Additive estimators for probabilities of correct classification , 1978, Pattern Recognit..

[28]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[29]  J. Kittler Feature selection and extraction , 1978 .