Unified Criterion of State Generalization for Reactive Autonomous Agents

Autonomous state generalization problem is a key issue in the research field of behavior learning of reactive agents, and many approaches have been proposed in recent years. However, those existing methods have a diversity in their criteria of state generalization or "how to define the similarity or distance between different sensor inputs", while it is not yet clear how this difference in the criteria would affect the entire learning process. In this paper, we first classify and examine those conventional heuristic criteria of state generalization, and then propose a new general framework for unifying all of them. This novel general criterion is based on minimization of weighted sum of entropies in multiple behavior outcomes of agents. An experimental study in the latter part suggests that this state generalization criterion enables a reactive agent to construct or reconstruct its state space in a more efficient and flexible way.

[1]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[2]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[3]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[4]  Minoru Asada,et al.  Action-based sensor space categorization for robot learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[5]  A. Meystel,et al.  Multiresolutional intelligent controller for baby robot , 1995, Proceedings of Tenth International Symposium on Intelligent Control.

[6]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[7]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[8]  Minoru Asada,et al.  Reasonable performance in less learning time by real robot based on incremental state space segmentation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[9]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[10]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[11]  Hiroshi Ishiguro,et al.  Robot oriented state space construction , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[12]  Shinichi Nakasuka,et al.  Simultaneous learning of situation classification based on rewards and behavior selection based on the situation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.