Learners that Use Little Information

We study learning algorithms that are restricted to using a small amount of information from their input sample. We introduce a category of learning algorithms we term $d$-bit information learners, which are algorithms whose output conveys at most $d$ bits of information of their input. A central theme in this work is that such algorithms generalize. We focus on the learning capacity of these algorithms, and prove sample complexity bounds with tight dependencies on the confidence and error parameters. We also observe connections with well studied notions such as sample compression schemes, Occam's razor, PAC-Bayes and differential privacy. We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information. On the other hand, we show that in the distribution-dependent setting every VC class has empirical risk minimizers that do not reveal a lot of information.

[1]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[2]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[3]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[4]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[5]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[6]  Mark Braverman,et al.  Public vs Private Coin in Bounded-Round Information , 2014, ICALP.

[7]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[8]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[9]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[10]  Amos Beimel,et al.  Bounds on the Sample Complexity for Private Learning and Private Data Release , 2010, TCC.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[13]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[14]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[15]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[16]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[18]  Toniann Pitassi,et al.  The Limits of Two-Party Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[19]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Jaikumar Radhakrishnan,et al.  The communication complexity of correlation , 2010, IEEE Trans. Inf. Theory.

[22]  Anindya De,et al.  Lower Bounds in Differential Privacy , 2011, TCC.

[23]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[24]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[25]  Vitaly Feldman,et al.  Sample Complexity Bounds on Differentially Private Learning via Communication Complexity , 2014, SIAM J. Comput..

[26]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[27]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[28]  Shay Moran,et al.  Sample compression schemes for VC classes , 2015, 2016 Information Theory and Applications Workshop (ITA).

[29]  Aaron Roth,et al.  Max-Information, Differential Privacy, and Post-selection Hypothesis Testing , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).