Communication-Efficient Distributed Online Prediction using Dynamic Model Synchronizations

We present the first protocol for distributed online prediction that aims to minimize online prediction loss and network communication at the same time. Applications include social content recommendation, algorithmic trading, and other scenarios where a configuration of local prediction models of high-frequency streams is used to provide a realtime service. For stationary data, the proposed protocol retains the asymptotic optimal regret of previous algorithms. At the same time, it allows to substantially reduce network communication, and, in contrast to previous approaches, it remains applicable when the data is non-stationary and shows rapid concept drift. The protocol is based on controlling the divergence of the local models in a decentralized way. Its beneficial properties are also confirmed empirically.

[1]  Gideon S. Mann,et al.  Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.

[2]  Assaf Schuster,et al.  Shape Sensitive Geometric Monitoring , 2012, IEEE Trans. Knowl. Data Eng..

[3]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[4]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[5]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[6]  Qing Zhao,et al.  Distributed Learning in Wireless Sensor Networks , 2007 .

[7]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[8]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[9]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[10]  Philip M. Long,et al.  Linear classifiers are nearly optimal when hidden variables have diverse effects , 2012, Machine Learning.

[11]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[14]  Ohad Shamir,et al.  Optimal Distributed Online Prediction , 2011, ICML.

[15]  Martin J. Wainwright,et al.  Decentralized detection and classification using kernel methods , 2004, ICML.

[16]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[17]  Avishek Saha,et al.  Efficient Protocols for Distributed Classification and Optimization , 2012, ALT.

[18]  Assaf Schuster,et al.  A geometric approach to monitoring threshold functions over distributed data streams , 2007, ACM Trans. Database Syst..

[19]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.