Lower Bounds for Bayes Error Estimation

We give a short proof of the following result. Let (X,Y) be any distribution on N/spl times/{0,1}, and let (X/sub 1/,Y/sub 1/),...,(X/sub n/,Y/sub n/) be an i.i.d. sample drawn from this distribution. In discrimination, the Bayes error L*=inf/sub g/P{g(X)/spl ne/Y} is of crucial importance. Here we show that without further conditions on the distribution of (X,Y), no rate-of-convergence results can be obtained. Let /spl phi//sub n/(X/sub 1/,Y/sub 1/,...,X/sub n/,Y/sub n/) be an estimate of the Bayes error, and let {/spl phi//sub n/(.)} be a sequence of such estimates. For any sequence {a/sub n/} of positive numbers converging to zero, a distribution of (X,Y) may be found such that E{|L*-/spl phi//sub n/(X/sub 1/,Y/sub 1/,...,X/sub n/,Y/sub n/)|}/spl ges/a/sub n/ often converges infinitely.

[1]  Keinosuke Fukunaga,et al.  Estimation of Classification Error , 1970, IEEE Transactions on Computers.

[2]  Kan Chen Input-Output Economic Analysis of Environmental Impact , 1973, IEEE Trans. Syst. Man Cybern..

[3]  Stephen S. Yau,et al.  Nonparametric Estimation of the Bayes Error of Feature Extractors Using Ordered Nearest Neighbor Sets , 1977, IEEE Transactions on Computers.

[4]  Zen Chen,et al.  Nonparametric Bayes Risk Estimation for Pattern Classification , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  L. Devroye On arbitrarily slow rates of global convergence in density estimation , 1983 .

[7]  L. Birge,et al.  On estimating a density using Hellinger distance and some other strange facts , 1986 .

[8]  Keinosuke Fukunaga,et al.  Bias of Nearest Neighbor Error Estimates , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[10]  L. Devroye Another proof of a slow convergence result of Birgé , 1995 .

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .