A Deep Connection Between the Vapnik–Chervonenkis Entropy and the Rademacher Complexity

In this paper, we derive a deep connection between the Vapnik-Chervonenkis (VC) entropy and the Rademacher complexity. For this purpose, we first refine some previously known relationships between the two notions of complexity and then derive new results, which allow computing an admissible range for the Rademacher complexity, given a value of the VC-entropy, and vice versa. The approach adopted in this paper is new and relies on the careful analysis of the combinatorial nature of the problem. The obtained results improve the state of the art on this research topic.

[1]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[2]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[3]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[4]  Shahar Mendelson,et al.  Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[5]  D. Kochedykov A combinatorial approach to hypothesis similarity in generalization bounds , 2011, Pattern Recognition and Image Analysis.

[6]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[7]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[8]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[9]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[10]  Gerald Tesauro,et al.  How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[11]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[12]  Davide Anguita,et al.  The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers , 2011, NIPS.

[13]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[14]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[15]  Martin Anthony,et al.  The influence of oppositely classified examples on the generalization complexity of Boolean functions , 2006, IEEE Transactions on Neural Networks.

[16]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[17]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[18]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[19]  Vladimir Cherkassky,et al.  Vapnik-Chervonenkis (VC) learning theory and its applications , 1999 .

[20]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[21]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[22]  Bernhard Schölkopf,et al.  Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions , 2009 .

[23]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[24]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[25]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[26]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[27]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[28]  William Li,et al.  Measuring the VC-Dimension Using Optimized Experimental Design , 2000, Neural Computation.

[29]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[30]  Joseph Sill Monotonicity and connectedness in learning systems , 1998 .

[31]  Hubert Haoyang Duan,et al.  Bounding the Fat Shattering Dimension of a Composition Function Class Built Using a Continuous Logic Connective , 2011, ArXiv.

[32]  Davide Anguita,et al.  Some results about the Vapnik-Chervonenkis entropy and the rademacher complexity , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[33]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[34]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[35]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[36]  Weisi Lin,et al.  Discretized-Vapnik-Chervonenkis Dimension for Analyzing Complexity of Real Function Classes , 2012, IEEE Transactions on Neural Networks and Learning Systems.