Issues in Putting Reinforcement Learning Onto Robots

There has recently been a good deal of interest in robot learning. Reinforcement Learning (RL) is a trial and error approach to learning that has recently become popular with roboticists. This is despite the fact that RL methods are very slow, and scale badly with the size of the state and action spaces, thus making them diicult to put onto real robots. This paper describes some work I have been doing on trying to understand why RL methods are so slow and on how they might be speeded up. A reinforcement learning algorithm loosely based on the theory of hypothesis testing is presented as are some preliminary results from employing this algorithm on a set of bandit problems.

[1]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[2]  Andrew W. Moore,et al.  Knowledge of knowledge and intelligent experimentation for learning control , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[3]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[4]  LearningRichard S. Suttonsutton On Step-Size and Bias in Temporal-Di erence , 1994 .

[5]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[6]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[7]  Pawea Cichosz,et al.  Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning , 1995 .

[8]  Andrew G. Barto,et al.  Connectionist learning for control , 1990 .

[9]  Mark D. Pendrith On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains , 1994 .

[10]  Tom M. Mitchell,et al.  Reinforcement learning with hidden states , 1993 .

[11]  Ulrich Nehmzow,et al.  Experiments in Competence Acquisition for Autonomous Mobile Robots , 1992 .

[12]  L.-J. Lin,et al.  Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[13]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[14]  Dana H. Ballard,et al.  Learning to Perceive and Act , 1990 .

[15]  Marco Colombetti,et al.  Training Agents to Perform Sequential Behavior , 1994, Adapt. Behav..

[16]  Alexander Zelinsky,et al.  Learning to Coordinate Behaviours for Robot Navigation , 1997 .

[17]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[18]  Satinder Singh The Ecient Learning of Multiple Task Sequences , 1992 .

[19]  Jonas Karlsson,et al.  Learning via task decomposition , 1993 .

[20]  Rodney A. Brooks,et al.  Real Robots, Real Learning Problems , 1993 .

[21]  M.A.F. Mcdonald,et al.  Approximate Discounted Dynamic Programming Is Unreliable , 1994 .

[22]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine-mediated learning.