Communication Efficient Distributed Agnostic Boosting

We consider the problem of learning from distributed data in the agnostic setting, i.e., in the presence of arbitrary forms of noise. Our main contribution is a general distributed boosting-based procedure for learning an arbitrary concept space, that is simultaneously noise tolerant, communication efficient, and computationally efficient. This improves significantly over prior works that were either communication efficient only in noise-free scenarios or computationally prohibitive. Empirical results on large synthetic and real-world datasets demonstrate the effectiveness and scalability of the proposed approach.

[1]  Adam Tauman Kalai,et al.  Potential-Based Agnostic Boosting , 2009, NIPS.

[2]  Shai Ben-David,et al.  Understanding Machine Learning: Preface , 2014 .

[3]  David P. Woodruff,et al.  Improved Distributed Principal Component Analysis , 2014, NIPS.

[4]  Avishek Saha,et al.  Efficient Protocols for Distributed Classification and Optimization , 2012, ALT.

[5]  Aaron Q. Li,et al.  Parameter Server for Distributed Machine Learning , 2013 .

[6]  Vitaly Feldman,et al.  Distribution-Specific Agnostic Boosting , 2009, ICS.

[7]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[8]  Chih-Jen Lin,et al.  Distributed Newton Methods for Regularized Logistic Regression , 2015, PAKDD.

[9]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[10]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[11]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[12]  Prasad Raghavendra,et al.  Agnostic Learning of Monomials by Halfspaces Is Hard , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[14]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[15]  Wei Chu,et al.  A case study of behavior-driven conjoint analysis on Yahoo!: front page today module , 2009, KDD.

[16]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[17]  Maria-Florina Balcan,et al.  Distributed Frank-Wolfe Algorithm: A Unified Framework for Communication-Efficient Sparse Learning , 2014, ArXiv.

[18]  Yingyu Liang,et al.  Distributed k-Means and k-Median Clustering on General Topologies , 2013, NIPS 2013.

[19]  Adam Tauman Kalai,et al.  On agnostic boosting and parity learning , 2008, STOC.

[20]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[21]  Dmitry Gavinsky Optimally-Smooth Adaptive Boosting and Application to Agnostic Learning , 2003, J. Mach. Learn. Res..

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[23]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[24]  Maria-Florina Balcan,et al.  Distributed k-means and k-median clustering on general communication topologies , 2013, NIPS.

[25]  Shai Ben-David,et al.  Agnostic Boosting , 2001, COLT/EuroCOLT.

[26]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[27]  Satyen Kale,et al.  Boosting and hard-core set constructions: a simplified approach , 2007, Electron. Colloquium Comput. Complex..

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.