Markov Decision Processes under Bandit Feedback
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[3] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[4] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[7] Magyar Tud. The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .
[8] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[9] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[10] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[11] Shie Mannor,et al. Arbitrarily modulated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[12] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2009, Math. Oper. Res..
[13] Shie Mannor,et al. Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.
[14] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.
[15] Csaba Szepesvari,et al. The Online Loop-free Stochastic Shortest-Path Problem , 2010, Annual Conference Computational Learning Theory.
[16] Varun Grover,et al. Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..
[17] Ambuj Tewari,et al. Online Learning: Stochastic and Constrained Adversaries , 2011, ArXiv.
[18] Ilse C. F. Ipsen,et al. Ergodicity Coefficients Defined by Vector Norms , 2011, SIAM J. Matrix Anal. Appl..
[19] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[20] Alessandro Lazaric,et al. Learning with stochastic inputs and adversarial outputs , 2012, J. Comput. Syst. Sci..
[21] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.