The results of imposing limitations on the number of states and of promoting the splitting of states in Q-learning are presented. Q-learning is a common reinforcement learning method in which the learning agent autonomously segments the environment states. In situations where the designer of an agent is unable to explicitly provide the agent with the boundaries of states in the environment in which the agent is acting, the agent needs to simultaneously learn while autonomously determining the internal discrete states that are needed in order to take the appropriate actions. A simple method of segmenting states based on a reinforcement signal (QLASS) has been proposed for this purpose. However, the original method suffers from the problem that the number of states grows excessively large as learning proceeds. A method is therefore proposed that defines temperature and eligibility attributes for each of the internal discrete states of the agent, and that limits and adds to the number of internal discrete states, and promotes random actions depending on the values of these attributes. The results of applying the proposed method to a number of tasks, including tasks that incorporate a dynamic environment, are compared to the QLASS method when only the reinforcement signal is used, and a similar level of learning results is found to be achieved using a fewer number of states. Furthermore, it is found that tasks are able to be completed in a small number of steps even when only a small number of trials are used for learning. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 90(9): 75– 86, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20383
[1]
Hiroshi Ishiguro,et al.
Robot oriented state space construction
,
1996,
Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.
[2]
Peter Dayan,et al.
Technical Note: Q-Learning
,
2004,
Machine Learning.
[3]
Minoru Asada,et al.
Action-based sensor space categorization for robot learning
,
1996,
Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.
[4]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[5]
Takashi Omori,et al.
Adaptive internal state space construction method for reinforcement learning of a real-world agent
,
1999,
Neural Networks.
[6]
Shinzo Kitamura,et al.
Q-Learning with adaptive state segmentation (QLASS)
,
1997,
Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.
[7]
Stuart I. Reynolds,et al.
Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning
,
2000,
International Conference on Machine Learning.
[8]
Andrew W. Moore,et al.
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
,
1999,
IJCAI.
[9]
Jon Louis Bentley,et al.
Multidimensional binary search trees used for associative searching
,
1975,
CACM.