Online Learning under Delayed Feedback
暂无分享,去创建一个
[1] E. C. Titchmarsh,et al. The Theory of the Riemann Zeta-Functions , 1952 .
[2] J. Doob. Stochastic processes , 1953 .
[3] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[4] Erik Ordentlich,et al. On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.
[5] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[6] Chris Mesterharm,et al. On-line Learning with Delayed Label Feedback , 2005, ALT.
[7] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[8] I. Pinelis. On inequalities for sums of bounded random variables , 2006, math/0603030.
[9] Haym Hirsh,et al. Improving on-line learning , 2007 .
[10] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[11] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[12] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[13] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[14] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[15] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.
[16] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.