论文信息 - Vision-based reinforcement learning for purposive behavior acquisition

Vision-based reinforcement learning for purposive behavior acquisition

This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal, and discusses several issues in applying the reinforcement learning method to a real robot with vision sensor. First, a "state-action deviation" problem is found as a form of perceptual aliasing in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To cope with this, an action set is constructed in such a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of learning form easy missions (or LEM) which is a similar technique to "shaping" in animal learning is implemented. LEM reduces the learning time from the exponential order in the size of the state space to about the linear order in the size of the state space. The results of computer simulations and real robot experiments are given.

[1] Editors , 1986, Brain Research Bulletin.

[2] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[3] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[4] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[5] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[6] Sridhar Mahadevan,et al. Robot Learning , 1993 .

[7] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.

[8] Masayuki Inaba,et al. Remote-Brained Robotics : Interfacing AI with Real World Behaviors , 1993 .

[9] Sridhar Mahadevan,et al. Rapid Task Learning for Real Robots , 1993 .

[10] George A. Bekey,et al. A reinforcement-learning approach to reactive control policy design for autonomous robots , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[11] Fuminori Saito,et al. Learning architecture for real robotic systems-extension of connectionist Q-learning for continuous robot control domain , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..