Distributed Learning, Communication Complexity and Privacy

We consider the problem of PAC-learning from distributed data and analyze fundamental communication complexity questions involved. We provide general upper and lower bounds on the amount of communication needed to learn well, showing that in addition to VC-dimension and covering number, quantities such as the teaching-dimension and mistake-bound of a class play an important role. We also present tight results for a number of common concept classes including conjunctions, parity functions, and decision lists. For linear separators, we show that for non-concentrated distributions, we can use a version of the Perceptron algorithm to learn with much less communication than the number of updates given by the usual margin bound. We also show how boosting can be performed in a generic manner in the distributed setting to achieve communication with only logarithmic dependence on 1/epsilon for any concept class, and demonstrate how recent work on agnostic learning from class-conditional queries can be used to achieve low communication in agnostic settings as well. We additionally present an analysis of privacy, considering both differential privacy and a notion of distributional privacy that is especially appealing in this context.

[1]  Viktor K. Prasanna,et al.  Information Transfer in Distributed Computing with Applications to VLSI , 1984, JACM.

[2]  David Peleg,et al.  Distributed Computing: A Locality-Sensitive Approach , 1987 .

[3]  Ronald L. Rivest,et al.  Learning complicated concepts reliably and usefully , 1988, Annual Conference Computational Learning Theory.

[4]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[5]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[6]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Rocco A. Servedio,et al.  Perceptron, Winnow, and PAC Learning , 2002, SIAM J. Comput..

[9]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[10]  Nader H. Bshouty Exact Learning of Formulas in Parallel , 2004, Machine Learning.

[11]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[12]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[13]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[14]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[15]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[16]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[17]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[18]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[19]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[20]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[21]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[22]  A. Razborov Communication Complexity , 2011 .

[23]  Rocco A. Servedio,et al.  Algorithms and hardness results for parallel large margin learning , 2011, J. Mach. Learn. Res..

[24]  Ohad Shamir,et al.  Optimal Distributed Online Prediction , 2011, ICML.

[25]  Avishek Saha,et al.  Efficient Protocols for Distributed Classification and Optimization , 2012, ALT.

[26]  Avishek Saha,et al.  Protocols for Learning Classifiers on Distributed Data , 2012, AISTATS.

[27]  Maria-Florina Balcan,et al.  Robust Interactive Learning , 2012, COLT.

[28]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.