Some results about the Vapnik-Chervonenkis entropy and the rademacher complexity

This paper deals with the problem of identifying a connection between the Vapnik-Chervonenkis (VC) Entropy, a notion of complexity introduced by Vapnik in his seminal work, and the Rademacher Complexity, a more powerful notion of complexity, which has been in the limelight of several works in the recent Machine Learning literature. In order to establish this connection, we refine some previously known relationships and derive a new result. Our proposal allows computing an admissible range for the Rademacher Complexity, given a value of the VC-Entropy, and vice versa, therefore opening new appealing research perspectives in the field of assessing the complexity of an hypothesis space.

[1]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[2]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[3]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[4]  Shahar Mendelson,et al.  Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[5]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[6]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[7]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[8]  Bernhard Schölkopf,et al.  Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions , 2009 .

[9]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[10]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[11]  H. Lappalainen,et al.  Using an MDL-based cost function with neural networks , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[12]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[13]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Hubert Haoyang Duan,et al.  Bounding the Fat Shattering Dimension of a Composition Function Class Built Using a Continuous Logic Connective , 2011, ArXiv.

[16]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[17]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[18]  Joseph Sill Monotonicity and connectedness in learning systems , 1998 .

[19]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[20]  B. Pataki,et al.  Lower Bounds on the Vapnik-Chervonenkis Dimension of Convex Polytope Classifiers , 2007, 2007 11th International Conference on Intelligent Engineering Systems.

[21]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[22]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[23]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[24]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[26]  D. Kochedykov A combinatorial approach to hypothesis similarity in generalization bounds , 2011, Pattern Recognition and Image Analysis.

[27]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[28]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[29]  Davide Anguita,et al.  The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers , 2011, NIPS.

[30]  Vladimir Vapnik,et al.  Learning using hidden information (Learning with teacher) , 2009, 2009 International Joint Conference on Neural Networks.

[31]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.