Learning without Interaction Requires Separation

One of the key resources in large-scale learning systems is the number of rounds of communication between the server and the clients holding the data points. We study this resource for systems with two types of constraints on the communication from each of the clients: local differential privacy and limited number of bits communicated. For both models the number of rounds of communications is captured by the number of rounds of interaction when solving the learning problem in the statistical query (SQ) model. For many learning problems known efficient algorithms require many rounds of interaction. Yet little is known on whether this is actually necessary. In the context of classification in the PAC learning model, Kasiviswanathan et al. (2008) constructed an artificial class of functions that is PAC learnable with respect to a fixed distribution but cannot be learned by an efficient non-interactive (or one-round) SQ algorithm. Here we show that a similar separation holds for learning linear separators and decision lists without assumptions on the distribution. To prove this separation we show that non-interactive SQ algorithms can only learn function classes of low margin complexity, that is classes of functions that can be represented as large-margin linear separators.

[1]  Zhi-Quan Luo,et al.  Universal decentralized estimation in a bandwidth constrained sensor network , 2005, IEEE Transactions on Information Theory.

[2]  John Duchi,et al.  Lower Bounds for Locally Private Estimation via Communication Complexity , 2019, COLT.

[3]  Martin J. Wainwright,et al.  Universal Quantile Estimation with Feedback in the Communication-Constrained Setting , 2006, 2006 IEEE International Symposium on Information Theory.

[4]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[6]  Himanshu Tyagi,et al.  Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.

[7]  Hans Ulrich Simon,et al.  A Close Look to Margin Complexity and Related Parameters , 2011, COLT.

[8]  Eric Balkanski,et al.  The adaptive complexity of maximizing a submodular function , 2018, STOC.

[9]  Martin J. Wainwright,et al.  Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation , 2013, NIPS.

[10]  Gregory Valiant,et al.  Memory, Communication, and Statistical Queries , 2016, COLT.

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Vitaly Feldman,et al.  Evolvability from learning algorithms , 2008, STOC.

[13]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[14]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[15]  Shai Ben-David,et al.  Limitations of Learning Via Embeddings in Euclidean Half Spaces , 2003, J. Mach. Learn. Res..

[16]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[17]  Feng Ruan,et al.  Minimax Bounds on Stochastic Batched Convex Optimization , 2018, COLT.

[18]  Alejandro Ribeiro,et al.  Bandwidth-constrained distributed estimation for wireless sensor Networks-part I: Gaussian case , 2006, IEEE Transactions on Signal Processing.

[19]  Jelena Diakonikolas,et al.  Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[20]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[21]  Eric Balkanski,et al.  An Exponential Speedup in Parallel Running Time for Submodular Maximization without Loss in Approximation , 2018, SODA.

[22]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2004, STOC '04.

[23]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[24]  Vitaly Feldman,et al.  Distribution-Independent Evolvability of Linear Threshold Functions , 2011, COLT.

[25]  Nathan Srebro,et al.  Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.

[26]  Hans Ulrich Simon,et al.  Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces , 2004, Machine Learning.

[27]  Vitaly Feldman,et al.  On Using Extended Statistical Queries to Avoid Membership Queries , 2001, J. Mach. Learn. Res..

[28]  Harry Buhrman,et al.  On Computation and Communication with Small Bias , 2007, Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07).

[29]  Jonathan Ullman,et al.  Preventing False Discovery in Interactive Data Analysis Is Hard , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[30]  Eric Balkanski,et al.  Parallelization does not Accelerate Convex Optimization: Adaptivity Lower Bounds for Non-smooth Convex Minimization , 2018, ArXiv.

[31]  Adam D. Smith,et al.  Is Interaction Necessary for Distributed Private Learning? , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[32]  Maria-Florina Balcan,et al.  Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy , 2013, Algorithmica.

[33]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[34]  Thomas Steinke,et al.  Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).

[35]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization , 2015, SODA.

[36]  Nathan Linial,et al.  Learning Complexity vs. Communication Complexity , 2008, 2008 23rd Annual IEEE Conference on Computational Complexity.

[37]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[38]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[39]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[40]  Shai Ben-David,et al.  Learning with restricted focus of attention , 1993, COLT '93.

[41]  Eric Balkanski,et al.  The limitations of optimization from samples , 2015, STOC.

[42]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[43]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[44]  John C. Duchi,et al.  Minimax rates for memory-bounded sparse linear regression , 2015, COLT.

[45]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[46]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[47]  Alexander A. Sherstov Halfspace Matrices , 2007, Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07).

[48]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[49]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.

[50]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[51]  Varun Kanade,et al.  Evolution with Recombination , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[52]  Alexander A. Razborov,et al.  Majority gates vs. general weighted threshold gates , 1992, [1992] Proceedings of the Seventh Annual Structure in Complexity Theory Conference.

[53]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[54]  Leslie G. Valiant,et al.  Evolvability , 2009, JACM.

[55]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.