Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes
暂无分享,去创建一个
[1] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[4] Andrew G. Barto,et al. Local Bandit Approximation for Optimal Learning Problems , 1996, NIPS.
[5] H. Martín,et al. Ex〈α〉: An effective algorithm for continuous actions Reinforcement Learning problems , 2009 .
[6] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[7] Michail G. Lagoudakis,et al. Binary action search for learning continuous-action control policies , 2009, ICML '09.
[8] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[9] David S. Touretzky,et al. Proceedings of the 1993 Connectionist Models Summer School , 2014 .
[10] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[11] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[12] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[13] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.
[14] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[15] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[16] Shimon Whiteson,et al. The Reinforcement Learning Competitions , 2010 .
[17] Michael L. Littman,et al. Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.
[18] Lihong Li,et al. Workshop summary: Results of the 2009 reinforcement learning competition , 2009, ICML '09.
[19] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[20] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.
[21] Guy Van den Broeck,et al. Automatic discretization of actions and states in Monte-Carlo tree search , 2011 .
[22] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[23] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[24] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.