Teaching and compressing for low VC-dimension

In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let $C$ be a binary concept class of size $m$ and VC-dimension $d$. Prior to this work, the best known upper bounds for both parameters were $\log(m)$, while the best lower bounds are linear in $d$. We present significantly better upper bounds on both as follows. Set $k = O(d 2^d \log \log |C|)$. We show that there always exists a concept $c$ in $C$ with a teaching set (i.e. a list of $c$-labeled examples uniquely identifying $c$ in $C$) of size $k$. This problem was studied by Kuhlmann (1999). Our construction implies that the recursive teaching (RT) dimension of $C$ is at most $k$ as well. The RT-dimension was suggested by Zilles et al. and Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehudayoff (2013). An upper bound on this parameter that depends only on $d$ is known just for the very simple case $d=1$, and is open even for $d=2$. We also make small progress towards this seemingly modest goal. We further construct sample compression schemes of size $k$ for $C$, with additional information of $k \log(k)$ bits. Roughly speaking, given any list of $C$-labelled examples of arbitrary length, we can retain only $k$ labeled examples in a way that allows to recover the labels of all others examples in the list, using additional $k\log (k)$ information bits. This problem was first suggested by Littlestone and Warmuth (1986).

[1]  Sally Floyd,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 2004, Machine Learning.

[2]  Benjamin I. P. Rubinstein,et al.  A Geometric Approach to Sample Compression , 2009, J. Mach. Learn. Res..

[3]  Manfred K. Warmuth,et al.  Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension , 1995, Machine Learning.

[4]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[5]  A. Chernikov,et al.  Externally definable sets and dependent pairs , 2010, 1007.4468.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[7]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[8]  Avi Wigderson,et al.  Population Recovery and Partial Identification , 2012, FOCS.

[9]  Shay Moran,et al.  Proper PAC learning is compressing , 2015, Electron. Colloquium Comput. Complex..

[10]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[11]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[12]  Andrew Tomkins,et al.  A computational model of teaching , 1992, COLT '92.

[13]  Dana Angluin,et al.  Learning from Different Teachers , 2004, Machine Learning.

[14]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[15]  Ronald L. Rivest,et al.  Learning Binary Relations and Total Orders , 1989, COLT 1989.

[16]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[17]  P. Assouad Densité et dimension , 1983 .

[18]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[19]  Sandra Zilles,et al.  Models of Cooperative Teaching and Learning , 2011, J. Mach. Learn. Res..

[20]  Manfred K. Warmuth Compressing to VC Dimension Many Points , 2003, COLT.

[21]  Peter L. Bartlett,et al.  Shifting: One-inclusion mistake bounds and sample compression , 2009, J. Comput. Syst. Sci..

[22]  Avi Wigderson,et al.  Restriction access , 2012, ITCS '12.

[23]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[24]  Manfred K. Warmuth,et al.  Unlabeled Compression Schemes for Maximum Classes, , 2007, COLT.

[25]  S. Ben-David,et al.  Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discrete Applied Mathematics.

[26]  Boting Yang,et al.  Generalizing Labeled and Unlabeled Sample Compression to Multi-label Concept Classes , 2014, ALT.

[27]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[28]  Shay Moran,et al.  Sample compression schemes for VC classes , 2016, ITA.

[29]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[30]  Roi Livni,et al.  Honest Compressions and Their Application to Compression Schemes , 2013, COLT.

[31]  John Shawe-Taylor,et al.  On exact specification by examples , 1992, COLT '92.

[32]  Ayumi Shinohara,et al.  Complexity of Teaching by a Restricted Number of Examples , 2009, COLT.

[33]  Frank J. Balbach Models for algorithmic teaching , 2007 .

[34]  Christian Kuhlmann On Teaching and Learning Intersection-Closed Concept Classes , 1999, EuroCOLT.

[35]  Noga Alon,et al.  Sign rank, VC dimension and spectral gaps , 2014, Electron. Colloquium Comput. Complex..

[36]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[37]  Hans Ulrich Simon,et al.  Recursive Teaching Dimension, Learning Complexity, and Maximum Classes , 2010, ALT.

[38]  M. Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[39]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[40]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[41]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[42]  Boting Yang,et al.  Algebraic methods proving Sauer's bound for teaching complexity , 2014, Theor. Comput. Sci..

[43]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[44]  Boting Yang,et al.  Sample Compression for Multi-label Concept Classes , 2014, COLT.

[45]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[46]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[47]  Ayumi Shinohara Teachability in computational learning , 2009, New Generation Computing.

[48]  Sally A. Goldman,et al.  Teaching a Smarter Learner , 1996, J. Comput. Syst. Sci..

[49]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[50]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[51]  Xi Chen,et al.  A Note on Teaching for VC Classes , 2016, Electron. Colloquium Comput. Complex..