Model Selection by Bootstrap Penalization for Classification

We consider the binary classification problem. Given an i.i.d. sample drawn from the distribution of an \(\mathcal{X}\times\{0,1\}\)-valued random pair, we propose to estimate the so-called Bayes classifier by minimizing the sum of the empirical classification error and a penalty term based on Efron’s or i.i.d. weighted bootstrap samples of the data. We obtain exponential inequalities for such bootstrap type penalties, which allow us to derive non-asymptotic properties for the corresponding estimators. In particular, we prove that these estimators achieve the global minimax risk over sets of functions built from Vapnik-Chervonenkis classes. The obtained results generalize Koltchinskii [12] and Bartlett, Boucheron, Lugosi’s [2] ones for Rademacher penalties that can thus be seen as special examples of bootstrap type penalties.

[1]  Albert Y. Lo,et al.  A large sample study of the Bayesian bootstrap , 1987 .

[2]  D. Pollard A central limit theorem for empirical processes , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[3]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[4]  E. Giné,et al.  Bootstrapping General Empirical Measures , 1990 .

[5]  P. Massart,et al.  Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .

[6]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[7]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[8]  M. Fromont Quelques problèmes de sélection de modèles : construction de tests adaptatifs, ajustement de pénalités par des méthodes de bootstrap , 2003 .

[9]  P. R. Kumar,et al.  Learning by canonical smooth estimation. II. Learning and choice of model complexity , 1996, IEEE Trans. Autom. Control..

[10]  P. R. Kumar,et al.  Learning by canonical smooth estimation. I. Simultaneous estimation , 1996, IEEE Trans. Autom. Control..

[11]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[12]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[13]  L. Devroye Bounds for the Uniform Deviation of Empirical Measures , 1982 .

[14]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[15]  J. Wellner,et al.  Exchangeably Weighted Bootstraps of the General Empirical Process , 1993 .

[16]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[17]  G. Lugosi,et al.  Adaptive Model Selection Using Empirical Complexities , 1998 .

[18]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[19]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[20]  P. Massart,et al.  A uniform Marcinkiewicz-Zygmund strong law of large numbers for empirical processes , 1998 .

[21]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[22]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[23]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[24]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[25]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[26]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[27]  Gábor Lugosi,et al.  Concept learning using complexity regularization , 1995, IEEE Trans. Inf. Theory.

[28]  D. Rubin The Bayesian Bootstrap , 1981 .

[29]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[30]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[31]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[32]  C. Weng,et al.  On a Second-Order Asymptotic Property of the Bayesian Bootstrap Mean , 1989 .

[33]  Luc Devroye,et al.  Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[34]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[35]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[36]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[37]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[38]  Émilie Lebarbier,et al.  Quelques approches pour la détection de ruptures à horizon fini , 2002 .

[39]  Peter L. Bartlett,et al.  Local Complexities for Empirical Risk Minimization , 2004, COLT.