Rademacher penalties and structural risk minimization

We suggest a penalty function to be used in various problems of structural risk minimization. This penalty is data dependent and is based on the sup-norm of the so-called Rademacher process indexed by the underlying class of functions (sets). The standard complexity penalties, used in learning problems and based on the VC-dimensions of the classes, are conservative upper bounds (in a probabilistic sense, uniformly over the set of all underlying distributions) for the penalty we suggest. Thus, for a particular distribution of training examples, one can expect better performance of learning algorithms with the data-driven Rademacher penalties. We obtain oracle inequalities for the theoretical risk of estimators, obtained by structural minimization of the empirical risk with Rademacher penalties. The inequalities imply some form of optimality of the empirical risk minimizers. We also suggest an iterative approach to structural risk minimization with Rademacher penalties, in which the hierarchy of classes is not given in advance, but is determined in the data-driven iterative process of risk minimization. We prove probabilistic oracle inequalities for the theoretical risk of the estimators based on this approach as well.

[1]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  V. V. Jurinskii Exponential Bounds for Large Deviations , 1974 .

[4]  D. Pollard A central limit theorem for empirical processes , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[7]  V. Milman,et al.  Asymptotic Theory Of Finite Dimensional Normed Spaces , 1986 .

[8]  Wansoo T. Rhee,et al.  Martingale Inequalities and NP-Complete Problems , 1987, Math. Oper. Res..

[9]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[10]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[13]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[14]  Gábor Lugosi,et al.  Concept learning using complexity regularization , 1995, IEEE Trans. Inf. Theory.

[15]  M. Talagrand A new look at independence , 1996 .

[16]  P. R. Kumar,et al.  Learning by canonical smooth estimation. II. Learning and choice of model complexity , 1996, IEEE Trans. Autom. Control..

[17]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[18]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[19]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[20]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[21]  Yoav Freund,et al.  Self bounding learning algorithms , 1998, COLT' 98.

[22]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[23]  Microchoice bounds and self bounding learning algorithms , 1999, COLT '99.

[24]  R. Dudley,et al.  Uniform Central Limit Theorems: Notation Index , 2014 .

[25]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[26]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[27]  Dmitry Panchenko,et al.  Improved sample complexity estimates for statistical learning control of uncertain systems , 2000, IEEE Trans. Autom. Control..

[28]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.