Continuous valued Q-learning method able to incrementally refine state space

The conventional reinforcement learning method has problems in applying to real robot tasks, because such method must be able to represent the values in terms of infinitely many states and action pairs. In order to represent an action value function continuously, a function approximation method is usually applied. In our previous work (2000), we pointed out that this type of learning method potentially has a discontinuity problem of optimal actions for a given state. In this paper, we propose a method for estimating where a discontinuity of the optimal action takes place and for refining a state space incrementally. We call this method an continuous valued Q-learning method. To show the validity of our method, we apply the method to a simulated robot.

[1]  Minoru Asada,et al.  Action-based sensor space categorization for robot learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[2]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[3]  Shinichi Nakasuka,et al.  Simultaneous learning of situation classification based on rewards and behavior selection based on the situation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[4]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[5]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[6]  Fuminori Saito,et al.  Learning architecture for real robotic systems-extension of connectionist Q-learning for continuous robot control domain , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[7]  Minoru Asada,et al.  Enhanced continuous valued Q-learning for real autonomous robots , 2000, Adv. Robotics.

[8]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[9]  Hiroshi Ishiguro,et al.  Robot oriented state space construction , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[10]  Shin Ishii,et al.  Reinforcement Learning Based on On-Line EM Algorithm , 1998, NIPS.

[11]  Minoru Asada,et al.  Reasonable performance in less learning time by real robot based on incremental state space segmentation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.