Simultaneous learning of situation classification based on rewards and behavior selection based on the situation

This paper describes a system with which a cognitive agent learns the way of abstraction and the policy of behavior selection simultaneously. We call the system situation transition network system (STNS). The system extracts situations and maintains them dynamically in the continuous state space on the basis of rewards from the environment. In this way, the system learns the way of abstraction in a dynamic environment. At the same time, the system stores results of transitions between situations and constructs a network of situations. This network is used for partial planning. At a point of time in the learning process, the system selects a behavior according to the partial plan. Because the planning is performed on a network of the abstracted situations, the agent with STNS does not have to deliberate details in planning. Furthermore, the agent can make a plan even on the early stage of learning because the planning is partial. Owing to the simultaneous learning with task executions the agent can adapt to the current task. The results of computer simulations are given.

[1]  Seiji Yamada Reactive Planning with Uncertainty of a Plan , 1992, Proceedings of the Third Annual Conference of AI, Simulation, and Planning in High Autonomy Systems 'Integrating Perception, Planning and Action'..

[2]  Minoru Asada,et al.  Non-Physical Intervention in Robot Learning Based on LfE Method , 1995 .

[3]  Luis Moreno,et al.  Learning emergent tasks for an autonomous mobile robot , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[4]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[5]  Hiroshi Ishiguro,et al.  Robot oriented state space construction , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[6]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[7]  Setsuo Ohsuga,et al.  Articulation problem—a basic problem for information modelling , 1990 .

[8]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.