An adjustment method of the number of states of Q-learning segmenting state space adaptively

This paper presents a method to partition a continuous state space for the purposes of realizing an autonomous behavior of agent. The basic idea of this partitioning technique is derived from QLASS (Q-learning with adaptive state segmentation) which is a simple and effective technique. In segmentation by QLASS, since discrete state space is constructed as Voronoi diagram which is generated by a set of a finite number of points called generators, the state space is intuitively easy to understand. However, as QLASS has a problem that the algorithm generates too many segments in which during the learning, an agent, which uses QLASS, cannot learn appropriate action efficiently. To overcome this problem, an adjustment method of the number of states is proposed, method which restricts or boosts the partitioning by using eligibilities and temperature parameter of each segment. Experimental results show that this adjustment method can partition state space suitably according to not only the environment characteristic but its dynamic changes.

[1]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[2]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[3]  Takashi Omori,et al.  Adaptive internal state space construction method for reinforcement learning of a real-world agent , 1999, Neural Networks.

[4]  Andrew W. Moore,et al.  The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.

[5]  Shinzo Kitamura,et al.  Q-Learning with adaptive state segmentation (QLASS) , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[6]  Minoru Asada,et al.  Action-based sensor space categorization for robot learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[7]  Stuart I. Reynolds,et al.  Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning , 2000, International Conference on Machine Learning.

[8]  Hiroshi Ishiguro,et al.  Robot oriented state space construction , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.