Sample compression schemes for VC classes

Sample compression schemes were defined by Littlestone and Warmuth (1986) as an abstraction of the structure underlying many learning algorithms. Roughly speaking, a sample compression scheme of size k means that given an arbitrary list of labeled examples, one can retain only k of them in a way that allows to recover the labels of all other examples in the list. They showed that compression implies PAC learnability for binary-labeled classes, and asked whether the other direction holds. We answer their question and show that every concept class C with VC dimension d has a sample compression scheme of size exponential in d. The proof uses an approximate minimax phenomenon for binary matrices of low VC dimension, which may be of interest in the context of game theory.

[1]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[2]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3]  Shay Moran,et al.  Teaching and compressing for low VC-dimension , 2015, Electron. Colloquium Comput. Complex..

[4]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[5]  Manfred K. Warmuth,et al.  Unlabeled Compression Schemes for Maximum Classes, , 2007, COLT.

[6]  Aranyak Mehta,et al.  Playing large games using simple strategies , 2003, EC '03.

[7]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[8]  Balas K. Natarajan,et al.  On learning sets and functions , 2004, Machine Learning.

[9]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[10]  Yi Li,et al.  Improved bounds on the sample complexity of learning , 2000, SODA '00.

[11]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[12]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[13]  Boting Yang,et al.  Generalizing Labeled and Unlabeled Sample Compression to Multi-label Concept Classes , 2014, ALT.

[14]  Manfred K. Warmuth Compressing to VC Dimension Many Points , 2003, COLT.

[15]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[16]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[17]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[18]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[19]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[20]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[21]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[22]  Peter L. Bartlett,et al.  Shifting: One-inclusion mistake bounds and sample compression , 2009, J. Comput. Syst. Sci..

[23]  Steve Hanneke,et al.  The Optimal Sample Complexity of PAC Learning , 2015, J. Mach. Learn. Res..

[24]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Shai Ben-David,et al.  Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discret. Appl. Math..

[27]  Richard J. Lipton,et al.  Simple strategies for large zero-sum games with applications to complexity theory , 1994, STOC '94.

[28]  P. Assouad Densité et dimension , 1983 .

[29]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[30]  Roi Livni,et al.  Honest Compressions and Their Application to Compression Schemes , 2013, COLT.

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[32]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[33]  Pierre Simon,et al.  Externally definable sets and dependent pairs II , 2012, 1202.2650.

[34]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[35]  Amit Daniely,et al.  Optimal learners for multiclass problems , 2014, COLT.

[36]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[37]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[38]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[39]  Benjamin I. P. Rubinstein,et al.  A Geometric Approach to Sample Compression , 2009, J. Mach. Learn. Res..

[40]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[41]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .