Compressing and Teaching for Low VC-Dimension

In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let C be a binary concept class of size m and VC-dimension d. Prior to this work, the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d. We present significantly better upper bounds on both as follows. We construct sample compression schemes of size exp(d) for C. This resolves a question of Littlest one and Warmuth (1986). Roughly speaking, we show that given an arbitrary set of labeled examples from an unknown concept in C, one can retain only a subset of exp(d) of them, in a way that allows to recover the labels of all other examples in the set, using additional exp(d) information bits. We further show that there always exists a concept c in C with a teaching set (i.e. A list of c-labeled examples uniquely identifying c in C) of size exp(d) log log(m). This problem was studied by Kuhlmann (1999). Our construction also implies that the recursive teaching (RT) dimension of C is at most exp(d) log log(m) as well. The RT-dimension was suggested by Zilles et al. And Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehuday off (2013). An upper bound on this parameter that depends only on d is known just for the very simple case d=1, and is open even for d=2. We also make small progress towards this seemingly modest goal.

[1]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[2]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[3]  Ayumi Shinohara,et al.  Complexity of Teaching by a Restricted Number of Examples , 2009, COLT.

[4]  Noga Alon,et al.  Sign rank, VC dimension and spectral gaps , 2014, Electron. Colloquium Comput. Complex..

[5]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[6]  Ayumi Shinohara,et al.  Teachability in computational learning , 1990, New Generation Computing.

[7]  Sally A. Goldman,et al.  Teaching a Smarter Learner , 1996, J. Comput. Syst. Sci..

[8]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[9]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[10]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[11]  Frank J. Balbach Models for algorithmic teaching , 2007 .

[12]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[13]  Hans Ulrich Simon,et al.  Recursive Teaching Dimension, Learning Complexity, and Maximum Classes , 2010, ALT.

[14]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[15]  Eyal Kushilevitz,et al.  Witness Sets for Families of Binary Vectors , 1996, J. Comb. Theory, Ser. A.

[16]  Andrew Tomkins,et al.  A computational model of teaching , 1992, COLT '92.

[17]  Yi Li,et al.  Improved bounds on the sample complexity of learning , 2000, SODA '00.

[18]  Ronald L. Rivest,et al.  Learning Binary Relations and Total Orders , 1993, SIAM J. Comput..

[19]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[20]  Dana Angluin,et al.  Learning from Different Teachers , 2004, Machine Learning.

[21]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[22]  Benjamin I. P. Rubinstein,et al.  A Geometric Approach to Sample Compression , 2009, J. Mach. Learn. Res..

[23]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[24]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[25]  Christian Kuhlmann On Teaching and Learning Intersection-Closed Concept Classes , 1999, EuroCOLT.

[26]  Temple F. Smith Occam's razor , 1980, Nature.

[27]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[28]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[29]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[30]  Boting Yang,et al.  Sample Compression for Multi-label Concept Classes , 2014, COLT.

[31]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[32]  Avi Wigderson,et al.  Population recovery and partial identification , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[33]  Translator-IEEE Expert staff Machine Learning: A Theoretical Approach , 1992, IEEE Expert.

[34]  Peter L. Bartlett,et al.  Shifting: One-inclusion mistake bounds and sample compression , 2009, J. Comput. Syst. Sci..

[35]  Boting Yang,et al.  Algebraic methods proving Sauer's bound for teaching complexity , 2014, Theor. Comput. Sci..

[36]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[37]  Shay Moran,et al.  Teaching and compressing for low VC-dimension , 2015, Electron. Colloquium Comput. Complex..

[38]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[39]  Sandra Zilles,et al.  Models of Cooperative Teaching and Learning , 2011, J. Mach. Learn. Res..

[40]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[41]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[42]  Avi Wigderson,et al.  Restriction access , 2012, ITCS '12.

[43]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[44]  Manfred K. Warmuth,et al.  Unlabeled Compression Schemes for Maximum Classes, , 2007, COLT.

[45]  Aranyak Mehta,et al.  Playing large games using simple strategies , 2003, EC '03.

[46]  John Shawe-Taylor,et al.  On exact specification by examples , 1992, COLT '92.

[47]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[48]  Manfred K. Warmuth Compressing to VC Dimension Many Points , 2003, COLT.

[49]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[50]  Shai Ben-David,et al.  Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discret. Appl. Math..

[51]  Richard J. Lipton,et al.  Simple strategies for large zero-sum games with applications to complexity theory , 1994, STOC '94.

[52]  P. Assouad Densité et dimension , 1983 .

[53]  Roi Livni,et al.  Honest Compressions and Their Application to Compression Schemes , 2013, COLT.

[54]  A. Chernikov,et al.  Externally definable sets and dependent pairs , 2010, Israel Journal of Mathematics.