Sample-Based Planning for Continuous Action Markov Decision Processes
暂无分享,去创建一个
Michael L. Littman | Ari Weinstein | Christopher R. Mansley | M. Littman | C. Mansley | A. Weinstein
[1] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[2] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[5] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.
[6] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[7] William D. Smart,et al. Receding Horizon Differential Dynamic Programming , 2007, NIPS.
[8] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[9] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[10] Michail G. Lagoudakis,et al. Binary action search for learning continuous-action control policies , 2009, ICML '09.
[11] Csaba Szepesvári,et al. –armed Bandits , 2022 .