Minimax Nonparametric Classification—Part I: Rates of Convergence

This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f, is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L/sub 2/ loss is determined by the massiveness of the class as measured by metric entropy. The second part of the paper studies minimax classification. The loss of interest is the difference between the probability of misclassification of a classifier and that of the Bayes decision. As is well known, an upper bound on risk for estimating f gives an upper bound on the risk for classification, but the rate is known to be suboptimal for the class of monotone functions. This suggests that one does not have to estimate f well in order to classify well. However, we show that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation. This is somewhat surprising in view of a result of Devroye, Gorfi, and Lugosi (see A Probabilistic Theory of Pattern Recognition, New York: Springer-Verlag, 1996).

[1]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[2]  A. Brown THEORY OF APPROXIMATION OF FUNCTIONS OF A REAL VARIABLE , 1966 .

[3]  G. Lorentz Metric entropy and approximation , 1966 .

[4]  M. Birman,et al.  PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[5]  D. S. Boak,et al.  Gas phase reactions of sodium: IV. Rates of reaction of methylchlorogermanes and -stannanes , 1971 .

[6]  Hans Tribel INTERPOLATION PROPERTIES OF $ \epsilon$-ENTROPY AND DIAMETERS. GEOMETRIC CHARACTERISTICS OF IMBEDDING FOR FUNCTION SPACES OF SOBOLEV-BESOV TYPE , 1975 .

[7]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[8]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[9]  J. Bretagnolle,et al.  Estimation des densités: risque minimax , 1978 .

[10]  M. Solomjak,et al.  Quantitative analysis in Sobolev imbedding theorems and applications to spectral theory , 1980 .

[11]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[12]  L. Gyorfi The rate of convergence of k_n -NN regression estimates and classification rules (Corresp.) , 1981 .

[13]  John B. Anderson Simulated error performance of multi-h phase codes , 1981, IEEE Trans. Inf. Theory.

[14]  Wlodzimierz Greblicki Asymptotic efficiency of classifying procedures using the Hermite series estimate of multivariate probability densities , 1981, IEEE Trans. Inf. Theory.

[15]  B. Carl Entropy numbers of embedding maps between Besov spaces with an application to eigenvalue problems , 1981 .

[16]  László Györfi,et al.  The Rate of Convergence of k ,-NN Regression Estimates and Classification Rules , 1978 .

[17]  J. Marron Optimal Rates of Convergence to Bayes Risk in Nonparametric Discrimination , 1983 .

[18]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[19]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[20]  L. Birge,et al.  On estimating a density using Hellinger distance and some other strange facts , 1986 .

[21]  Adam Krzyzak,et al.  The rates of convergence of kernel regression estimates and classification rules , 1986, IEEE Trans. Inf. Theory.

[22]  A. Barron Are Bayes Rules Consistent in Information , 1987 .

[23]  Thomas M. Cover,et al.  Open Problems in Communication and Computation , 2011, Springer New York.

[24]  Related Topics,et al.  Nonparametric functional estimation and related topics , 1991 .

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[27]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[28]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[29]  G. Kerkyacharian,et al.  Density estimation in Besov spaces , 1992 .

[30]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[31]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[32]  A. Timan Theory of Approximation of Functions of a Real Variable , 1994 .

[33]  Y. Makovoz Random Approximants and Neural Networks , 1996 .

[34]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[35]  G. Lorentz,et al.  Constructive approximation : advanced problems , 1996 .

[36]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[37]  C. Huber Lower Bounds for Function Estimation , 1997 .

[38]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[39]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[40]  Yuhong Yang,et al.  Minimax Nonparametric Classification — Part II : Model Selection for Adaptation , 1998 .