Online Markov Decision Processes Under Bandit Feedback
暂无分享,去创建一个
Csaba Szepesvári | Gergely Neu | András György | András Antos | Csaba Szepesvari | A. György | A. Antos | Gergely Neu
[1] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[2] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[3] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[4] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2009, Math. Oper. Res..
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[7] Ambuj Tewari,et al. Online Learning: Stochastic and Constrained Adversaries , 2011, ArXiv.
[8] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[9] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[10] Claudio Gentile,et al. Proceedings of the 20th annual conference on Learning theory , 2007 .
[11] Csaba Szepesvari,et al. Markov Decision Processes under Bandit Feedback , 2015 .
[12] Varun Grover,et al. Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..
[13] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[14] Ilse C. F. Ipsen,et al. Ergodicity Coefficients Defined by Vector Norms , 2011, SIAM J. Matrix Anal. Appl..
[15] Alessandro Lazaric,et al. Learning with stochastic inputs and adversarial outputs , 2012, J. Comput. Syst. Sci..
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[18] Shie Mannor,et al. Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.
[19] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[20] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[21] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[22] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[23] Shie Mannor,et al. Arbitrarily modulated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.