Self-organizing cognitive agents and reinforcement learning in multi-agent environment

This paper presents a self-organizing cognitive architecture, known as TD-FALCON, that learns to function through its interaction with the environment. TD-FALCON learns the value functions of the state-action space estimated through a temporal difference (TD) method. The learned value functions are then used to determine the optimal actions based on an action selection policy. We present a specific instance of TD-FALCON based on an e-greedy action policy and a Q-learning value estimation formula. Experiments based on a minefield navigation task and a minefield pursuit task show that TD-FALCON systems are able to adapt and function well in a multi-agent environment without an explicit mechanism for collaboration.

[1]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[2]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[3]  Ron Sun,et al.  From implicit skills to explicit knowledge: a bottom-up model of skill learning , 2001, Cogn. Sci..

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Barbara Dunin-Keplicz,et al.  Proceedings of the 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology , 2005 .

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Ah-Hwee Tan,et al.  Adaptive resonance associative map , 1995, Neural Networks.

[8]  Ah-Hwee Tan,et al.  FALCON: a fusion architecture for learning, cognition, and navigation , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[9]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[10]  Andrés Pérez Uribe,et al.  Structure-Adaptable Digital Neural Networks , 1999 .

[11]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[12]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .

[13]  Andres Perez-Uribe,et al.  Structure-Adaptable Digital Neural Networks , 1999 .

[14]  D. Gordon A Cognitive Model of Learning to Navigate , 1997 .

[15]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.