论文信息 - Robot Learning in Partially Observable, Noisy, Continuous Worlds

Robot Learning in Partially Observable, Noisy, Continuous Worlds

Partially-observable Markov decision problems (POMDPs) pose special difficulties for the task of learning robot control policies, due to the need to disambiguate perceptually aliased states. Short-term memories of recent actions and/or percepts are required to provide context for the robot to perform such disambiguation. We introduce Variable-Resolution Percept Discretization (VRPD) as an extension to Utile Suffix Memory (USM), an algorithm designed to solve discrete POMDPs. This extension allows USM to function effectively in noisy, continuous worlds. We describe the extension in detail, then we demonstrate experimentally the improvements that it makes to USM in the context of continuous POMDPs.

Reid Broadbent | Todd Peterson | T. Peterson | Reid Broadbent

[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[2] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3] AN Kolmogorov-Smirnov,et al. Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[4] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[5] Ron Sun,et al. Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[6] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[7] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9] Sridhar Mahadevan,et al. Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[10] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.