Minimax Nonparametric Classification — Part II : Model Selection for Adaptation

For pt.I see ibid., vol.45, no.7, p.2271-84 (1999). We study nonparametric estimation of a conditional probability for classification based on a collection of finite-dimensional models. For the sake of flexibility, different types of models, linear or nonlinear, are allowed as long as each satisfies a dimensionality assumption. We show that with a suitable model selection criterion, the penalized maximum-likelihood estimator has a risk bounded by an index of resolvability expressing a good tradeoff among approximation error, estimation error, and model complexity. The bound does not require any assumption on the target conditional probability and can be used to demonstrate the adaptivity of estimators based on model selection. Examples are given with both splines and neural nets, and problems of high-dimensional estimation are considered. The resulting adaptive estimator is shown to behave optimally or near optimally over Sobolev classes (with unknown orders of interaction and smoothness) and classes of integrable Fourier transform of gradient. In terms of rates of convergence, the performance is the same as if one knew which of them contains the true conditional probability in advance. The corresponding classifier also converges optimally or nearly optimally simultaneously over these classes.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  C. D. Boor,et al.  Spline approximation by quasiinterpolants , 1973 .

[3]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[4]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[5]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[7]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[8]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[9]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[10]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[11]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[12]  G. Lugosi,et al.  Concept learning using complexity regularization , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[13]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[14]  Adam Krzyzak,et al.  Radial Basis Function Networks and Complexity Regularization in Function Learning , 2022 .

[15]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[16]  G. Lugosi,et al.  Adaptive Model Selection Using Empirical Complexities , 1998 .

[17]  Yuhong Yang,et al.  An Asymptotic Property of Model Selection Criteria , 1998, IEEE Trans. Inf. Theory.

[18]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[19]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[20]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .