论文信息 - Teaching and compressing for low VC-dimension

Teaching and compressing for low VC-dimension

In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let $C$ be a binary concept class of size $m$ and VC-dimension $d$. Prior to this work, the best known upper bounds for both parameters were $\log(m)$, while the best lower bounds are linear in $d$. We present significantly better upper bounds on both as follows. Set $k = O(d 2^d \log \log |C|)$. We show that there always exists a concept $c$ in $C$ with a teaching set (i.e. a list of $c$-labeled examples uniquely identifying $c$ in $C$) of size $k$. This problem was studied by Kuhlmann (1999). Our construction implies that the recursive teaching (RT) dimension of $C$ is at most $k$ as well. The RT-dimension was suggested by Zilles et al. and Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehudayoff (2013). An upper bound on this parameter that depends only on $d$ is known just for the very simple case $d=1$, and is open even for $d=2$. We also make small progress towards this seemingly modest goal. We further construct sample compression schemes of size $k$ for $C$, with additional information of $k \log(k)$ bits. Roughly speaking, given any list of $C$-labelled examples of arbitrary length, we can retain only $k$ labeled examples in a way that allows to recover the labels of all others examples in the list, using additional $k\log (k)$ information bits. This problem was first suggested by Littlestone and Warmuth (1986).

[1] Sally Floyd,et al. Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 2004, Machine Learning.

[2] Benjamin I. P. Rubinstein,et al. A Geometric Approach to Sample Compression , 2009, J. Mach. Learn. Res..

[3] Manfred K. Warmuth,et al. Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension , 1995, Machine Learning.

[4] Sally Floyd,et al. Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[5] A. Chernikov,et al. Externally definable sets and dependent pairs , 2010, 1007.4468.

[6] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[7] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.

[8] Avi Wigderson,et al. Population Recovery and Partial Identification , 2012, FOCS.

[9] Shay Moran,et al. Proper PAC learning is compressing , 2015, Electron. Colloquium Comput. Complex..

[10] R. Dudley. Central Limit Theorems for Empirical Measures , 1978 .

[11] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[12] Andrew Tomkins,et al. A computational model of teaching , 1992, COLT '92.

[13] Dana Angluin,et al. Learning from Different Teachers , 2004, Machine Learning.

[14] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..

[15] Ronald L. Rivest,et al. Learning Binary Relations and Total Orders , 1989, COLT 1989.

[16] Steve Hanneke,et al. Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[17] P. Assouad. Densité et dimension , 1983 .

[18] David Haussler,et al. ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[19] Sandra Zilles,et al. Models of Cooperative Teaching and Learning , 2011, J. Mach. Learn. Res..

[20] Manfred K. Warmuth. Compressing to VC Dimension Many Points , 2003, COLT.

[21] Peter L. Bartlett,et al. Shifting: One-inclusion mistake bounds and sample compression , 2009, J. Comput. Syst. Sci..

[22] Avi Wigderson,et al. Restriction access , 2012, ITCS '12.

[23] John Shawe-Taylor,et al. The Set Covering Machine , 2003, J. Mach. Learn. Res..

[24] Manfred K. Warmuth,et al. Unlabeled Compression Schemes for Maximum Classes, , 2007, COLT.

[25] S. Ben-David,et al. Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discrete Applied Mathematics.

[26] Boting Yang,et al. Generalizing Labeled and Unlabeled Sample Compression to Multi-label Concept Classes , 2014, ALT.

[27] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[28] Shay Moran,et al. Sample compression schemes for VC classes , 2016, ITA.

[29] Manfred K. Warmuth,et al. Learning integer lattices , 1990, COLT '90.

[30] Roi Livni,et al. Honest Compressions and Their Application to Compression Schemes , 2013, COLT.

[31] John Shawe-Taylor,et al. On exact specification by examples , 1992, COLT '92.

[32] Ayumi Shinohara,et al. Complexity of Teaching by a Restricted Number of Examples , 2009, COLT.

[33] Frank J. Balbach. Models for algorithmic teaching , 2007 .

[34] Christian Kuhlmann. On Teaching and Learning Intersection-Closed Concept Classes , 1999, EuroCOLT.

[35] Noga Alon,et al. Sign rank, VC dimension and spectral gaps , 2014, Electron. Colloquium Comput. Complex..

[36] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[37] Hans Ulrich Simon,et al. Recursive Teaching Dimension, Learning Complexity, and Maximum Classes , 2010, ALT.

[38] M. Kearns,et al. On the complexity of teaching , 1991, COLT '91.

[39] Manfred K. Warmuth,et al. On Weak Learning , 1995, J. Comput. Syst. Sci..

[40] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .

[41] Ronald L. Rivest,et al. Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[42] Boting Yang,et al. Algebraic methods proving Sauer's bound for teaching complexity , 2014, Theor. Comput. Sci..

[43] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[44] Boting Yang,et al. Sample Compression for Multi-label Concept Classes , 2014, COLT.

[45] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[46] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[47] Ayumi Shinohara. Teachability in computational learning , 2009, New Generation Computing.

[48] Sally A. Goldman,et al. Teaching a Smarter Learner , 1996, J. Comput. Syst. Sci..

[49] Pedro M. Domingos. The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[50] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[51] Xi Chen,et al. A Note on Teaching for VC Classes , 2016, Electron. Colloquium Comput. Complex..