Online Learning under Delayed Feedback

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.

[1]  E. C. Titchmarsh,et al.  The Theory of the Riemann Zeta-Functions , 1952 .

[2]  J. Doob Stochastic processes , 1953 .

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Erik Ordentlich,et al.  On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Chris Mesterharm,et al.  On-line Learning with Delayed Label Feedback , 2005, ALT.

[7]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[8]  I. Pinelis On inequalities for sums of bounded random variables , 2006, math/0603030.

[9]  Haym Hirsh,et al.  Improving on-line learning , 2007 .

[10]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[11]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[12]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[13]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[14]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[15]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[16]  Csaba Szepesvári,et al.  Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.