Error estimation in pattern recognition via LAlpha -distance between posterior density functions

The L^{ \alpha} -distance between posterior density functions (PDF's) is proposed as a separability measure to replace the probability of error as a criterion for feature extraction in pattern recognition. Upper and lower bounds on Bayes error are derived for \alpha > 0 . If \alpha = 1 , the lower and upper bounds coincide; an increase (or decrease) in \alpha loosens these bounds. For \alpha = 2 , the upper bound equals the best commonly used bound and is equal to the asymptotic probability of error of the first nearest neighbor classifier. The case when \alpha = 1 is used for estimation of the probability of error in different problem situations, and a comparison is made with other methods. It is shown how unclassified samples may also be used to improve the variance of the estimated error. For the family of exponential probability density functions (pdf's), the relation between the distance of a sample from the decision boundary and its contribution to the error is derived. In the nonparametric case, a consistent estimator is discussed which is computationally more efficient than estimators based on Parzen's estimation. A set of computer simulation experiments are reported to demonstrate the statistical advantages of the separability measure with \alpha = 1 when used in an error estimation scheme.

[1]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  V. K. Eliseev,et al.  Statistical estimation of recognition error probability from experimental data , 1967 .

[4]  Keinosuke Fukunaga,et al.  Estimation of Classification Error , 1970, IEEE Transactions on Computers.

[5]  Geoffrey H. Ball,et al.  Data analysis in the social sciences: what about the details? , 1965, AFIPS '65 (Fall, part I).

[6]  Godfried Theodore Patrick Toussaint,et al.  Feature evaluation criteria and contextual decoding algorithms in statistical pattern recognition. , 1972 .

[7]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[8]  Anthony N. Mucciardi,et al.  A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties , 1971, IEEE Transactions on Computers.

[9]  William S. Meisel,et al.  Computer-oriented approaches to pattern recognition , 1972 .

[10]  N. Glick Separation and probability of correct classification among two or more distributions , 1973 .

[11]  C. H. CHEN,et al.  Theoretical Comparison of a Class of Feature Selection Criteria in Pattern Recognition , 1971, IEEE Transactions on Computers.

[12]  Edward A. Patrick,et al.  Nonparametric feature selection , 1969, IEEE Trans. Inf. Theory.

[13]  Richard P. Heydorn,et al.  Redundancy in Feature Extraction , 1971, IEEE Transactions on Computers.

[14]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[15]  D. Lainiotis,et al.  Probability of Error Bounds , 1971 .

[16]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[17]  Demetrios G. Lainiotis,et al.  A class of upper bounds on probability of error for multihypotheses pattern recognition (Corresp.) , 1969, IEEE Trans. Inf. Theory.

[18]  Demetrios G. Lainiotis,et al.  Feature Extraction Criteria: Comparison and Evaluation, , 1972 .

[19]  C. Chitti Babu On the distance criterion of Patrick and Fischer (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[20]  Pierre A. Devijver,et al.  On a New Class of Bounds on Bayes Risk in Multihypothesis Pattern Recognition , 1974, IEEE Transactions on Computers.

[21]  P. Lachenbruch On Expected Probabilities of Misclassification in Discriminant Analysis, Necessary Sample Size, and a Relation with the Multiple Correlation Coefficient , 1968 .

[22]  M. Sorum Estimating the Conditional Probability of Misclassification , 1971 .

[23]  Tsvi Lissack Comments on 'On the distance criterion of Patrick and Fischer' (Corresp.) by Chitti, Babu C , 1973, IEEE Trans. Inf. Theory.

[24]  I. P. Natanson,et al.  Theory of Functions of a Real Variable , 1955 .

[25]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[26]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[27]  Keinosuke Fukunaga,et al.  A Criterion and an Algorithm for Grouping Data , 1970, IEEE Transactions on Computers.