Memory, Communication, and Statistical Queries

If a concept class can be represented with a certain amount of memory, can it be efficiently learned with the same amount of memory? What concepts can be efficiently learned by algorithms that extract only a few bits of information from each example? We introduce a formal framework for studying these questions, and investigate the relationship between the fundamental resources of memory or communication and the sample complexity of the learning task. We relate our memory-bounded and communication-bounded learning models to the well-studied statistical query model. This connection can be leveraged to obtain both upper and lower bounds: we show strong lower bounds on learning parity functions with bounded communication, as well as the first upper bounds on solving generic sparse linear regression problems with limited memory.

[1]  P. Assouad Deux remarques sur l'estimation , 1983 .

[2]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[3]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[4]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[5]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[6]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[7]  Maria-Florina Balcan,et al.  Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy , 2013, Algorithmica.

[8]  Ohad Shamir,et al.  Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[9]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[10]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[11]  Stefan Wager,et al.  The Statistics of Streaming Sparse Regression , 2014, ArXiv.

[12]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[13]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[14]  Balázs Szörényi Characterizing Statistical Query Learning: Simplified Notions and Proofs , 2009, ALT.

[15]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[16]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[17]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[18]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[19]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[20]  L. L. Cam,et al.  Asymptotic methods in statistical theory , 1986 .

[21]  Shai Ben-David,et al.  Learning with restricted focus of attention , 1993, COLT '93.

[22]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[23]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Stochastic Convex Optimization , 2015, ArXiv.

[24]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[25]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Tengyu Ma,et al.  On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.

[28]  Zvi Galil,et al.  Lower bounds on communication complexity , 1984, STOC '84.

[29]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[30]  T. Cover Hypothesis Testing with Finite Statistics , 1969 .

[31]  Ran Raz,et al.  Exponential Separation of Information and Communication , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[32]  A. Razborov Communication Complexity , 2011 .

[33]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[34]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[35]  Emmanuel J. Candès,et al.  On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[36]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[37]  John C. Duchi,et al.  Minimax rates for memory-bounded sparse linear regression , 2015, COLT.

[38]  Martin J. Wainwright,et al.  Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions , 2012, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[39]  Mark Braverman,et al.  Information Equals Amortized Communication , 2011, IEEE Transactions on Information Theory.

[40]  John N. Tsitsiklis,et al.  On Learning With Finite Memory , 2012, IEEE Transactions on Information Theory.

[41]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[42]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[43]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[44]  Ran Raz,et al.  Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[45]  Omer Reingold,et al.  Undirected connectivity in log-space , 2008, JACM.