论文信息 - Online Learning under Delayed Feedback

Online Learning under Delayed Feedback

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.

[1] E. C. Titchmarsh,et al. The Theory of the Riemann Zeta-Functions , 1952 .

[2] J. Doob. Stochastic processes , 1953 .

[3] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4] Erik Ordentlich,et al. On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.

[5] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6] Chris Mesterharm,et al. On-line Learning with Delayed Label Feedback , 2005, ALT.

[7] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[8] I. Pinelis. On inequalities for sums of bounded random variables , 2006, math/0603030.

[9] Haym Hirsh,et al. Improving on-line learning , 2007 .

[10] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[11] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[12] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[13] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[14] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[15] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[16] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.