论文信息 - Self-Organizing Neural Architectures and Cooperative Learning in a Multiagent Environment

Self-Organizing Neural Architectures and Cooperative Learning in a Multiagent Environment

Temporal-difference-fusion architecture for learning, cognition, and navigation (TD-FALCON) is a generalization of adaptive resonance theory (a class of self-organizing neural networks) that incorporates TD methods for real-time reinforcement learning. In this paper, we investigate how a team of TD-FALCON networks may cooperate to learn and function in a dynamic multiagent environment based on minefield navigation and a predator/prey pursuit tasks. Experiments on the navigation task demonstrate that TD-FALCON agent teams are able to adapt and function well in a multiagent environment without an explicit mechanism of collaboration. In comparison, traditional Q-learning agents using gradient-descent-based feedforward neural networks, trained with the standard backpropagation and the resilient-propagation (RPROP) algorithms, produce a significantly poorer level of performance. For the predator/prey pursuit task, we experiment with various cooperative strategies and find that a combination of a high-level compressed state representation and a hybrid reward function produces the best results. Using the same cooperative strategy, the TD-FALCON team also outperforms the RPROP-based reinforcement learners in terms of both task completion rate and learning efficiency.

Ah-Hwee Tan | Dan Xiao | A. Tan | D. Xiao

[1] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2] Ah-Hwee Tan,et al. Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback , 2008, IEEE Transactions on Neural Networks.

[3] Stephen Grossberg,et al. Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[4] Reda Alhajj,et al. Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[6] Larry Bull,et al. Evolving cooperative communicating classifier systems , 1994 .

[7] M. Benda,et al. On Optimal Cooperation of Knowledge Sources , 1985 .

[8] Edmund H. Durfee,et al. The moving target function problem in multi-agent learning , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[9] Hans-Paul Schwefel,et al. Advantages (and disadvantages) of evolutionary computation over other approaches , 2018, Evolutionary Computation 1.

[10] Jörg Denzinger,et al. On Customizing Evolutionary Learning of Agent Behavior , 2004, Canadian Conference on AI.

[11] Stephen Grossberg,et al. A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[12] Ville Könönen,et al. Gradient descent for symmetric and asymmetric multiagent reinforcement learning , 2005, Web Intell. Agent Syst..

[13] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[14] Kagan Tumer,et al. Reinforcement Learning in Large Multi-agent Systems , 2005 .

[15] R. Sun,et al. Bottom-up Skill Learning in Reactive Sequential Decision Tasks , 1996 .

[16] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[17] Erfu Yang,et al. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[18] Ah-Hwee Tan,et al. FALCON: a fusion architecture for learning, cognition, and navigation , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[19] Marc Parizeau,et al. Analysis of a master-slave architecture for distributed evolutionary computations , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20] Peter Stone,et al. Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.

[21] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[22] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[23] Sandip Sen,et al. Individual learning of coordination knowledge , 1998, J. Exp. Theor. Artif. Intell..

[24] Andres Perez-Uribe,et al. Structure-Adaptable Digital Neural Networks , 1999 .

[25] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[26] Andrés Pérez Uribe,et al. Structure-Adaptable Digital Neural Networks , 1999 .

[27] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[28] K. Sycara,et al. Learning Approach for Planning and Scheduling in Multi-Agent Domain , 2000 .

[29] George D. Magoulas,et al. A New Learning Rates Adaptation Strategy for the Resilient Propagation Algorithm , 2004, ESANN.

[30] Reda Alhajj,et al. Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[32] Ron Sun,et al. From implicit skills to explicit knowledge: a bottom-up model of skill learning , 2001, Cogn. Sci..

[33] Kao-Shing Hwang,et al. Cooperative multiagent congestion control for high-speed networks , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[34] Ah-Hwee Tan,et al. Self-organizing cognitive agents and reinforcement learning in multi-agent environment , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[35] D. Gordon. A Cognitive Model of Learning to Navigate , 1997 .

[36] B. Moore,et al. ART1 and pattern clustering , 1989 .

[37] Sandip Sen,et al. Evaluating concurrent reinforcement learners , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[38] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[39] M. Matarić. Learning to Behave Socially , 1994 .

[40] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[41] Tucker Balch,et al. Reward and Diversity in Multirobot Foraging , 1999, IJCAI 1999.

[42] H. L. Akin,et al. Q-Learning based Market-Driven Multi-Agent Collaboration in Robot Soccer , 2004 .

[43] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[44] Tucker Balch,et al. Learning Roles: Behavioral Diversity in Robot Teams , 1997 .

[45] Kenji Fukumoto,et al. Multi-agent Reinforcement Learning: A Modular Approach , 1996 .

[46] Maja J. Matarić,et al. Leaning to behave socially , 1994 .

[47] Matthias Fuchs,et al. Experiments in learning prototypical situations for variants of the pursuit game , 1999 .

[48] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50] Sandip Sen,et al. Towards a pareto-optimal solution in general-sum games , 2003, AAMAS '03.

[51] Martin A. Riedmiller,et al. RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[52] Çetin Meriçli,et al. Market-Driven Multi-Agent Collaboration in Robot Soccer Domain , 2005 .

[53] Stephen Grossberg,et al. Art 2: Self-Organization Of Stable Category Recognition Codes For Analog Input Patterns , 1988, Other Conferences.