论文信息 - Multi-resolution Exploration in Continuous Spaces

Multi-resolution Exploration in Continuous Spaces

The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would benefit from additional samples. We demonstrate MRE's broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches.

Michael L. Littman | Ali Nouri | M. Littman | A. Nouri

[1] Andrew W. Moore,et al. Rates of Convergence for Variable Resolution Schemes in Optimal Control , 2000, ICML.

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[5] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[6] Michael Ian Shamos,et al. Computational geometry: an introduction , 1985 .

[7] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.

[10] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.

[12] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[14] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[15] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[16] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[17] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.