Self-Organizing Neural Architectures and Cooperative Learning in a Multiagent Environment

Temporal-difference-fusion architecture for learning, cognition, and navigation (TD-FALCON) is a generalization of adaptive resonance theory (a class of self-organizing neural networks) that incorporates TD methods for real-time reinforcement learning. In this paper, we investigate how a team of TD-FALCON networks may cooperate to learn and function in a dynamic multiagent environment based on minefield navigation and a predator/prey pursuit tasks. Experiments on the navigation task demonstrate that TD-FALCON agent teams are able to adapt and function well in a multiagent environment without an explicit mechanism of collaboration. In comparison, traditional Q-learning agents using gradient-descent-based feedforward neural networks, trained with the standard backpropagation and the resilient-propagation (RPROP) algorithms, produce a significantly poorer level of performance. For the predator/prey pursuit task, we experiment with various cooperative strategies and find that a combination of a high-level compressed state representation and a hybrid reward function produces the best results. Using the same cooperative strategy, the TD-FALCON team also outperforms the RPROP-based reinforcement learners in terms of both task completion rate and learning efficiency.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  Ah-Hwee Tan,et al.  Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback , 2008, IEEE Transactions on Neural Networks.

[3]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[4]  Reda Alhajj,et al.  Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[6]  Larry Bull,et al.  Evolving cooperative communicating classifier systems , 1994 .

[7]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .

[8]  Edmund H. Durfee,et al.  The moving target function problem in multi-agent learning , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[9]  Hans-Paul Schwefel,et al.  Advantages (and disadvantages) of evolutionary computation over other approaches , 2018, Evolutionary Computation 1.

[10]  Jörg Denzinger,et al.  On Customizing Evolutionary Learning of Agent Behavior , 2004, Canadian Conference on AI.

[11]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[12]  Ville Könönen,et al.  Gradient descent for symmetric and asymmetric multiagent reinforcement learning , 2005, Web Intell. Agent Syst..

[13]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[14]  Kagan Tumer,et al.  Reinforcement Learning in Large Multi-agent Systems , 2005 .

[15]  R. Sun,et al.  Bottom-up Skill Learning in Reactive Sequential Decision Tasks , 1996 .

[16]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[17]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[18]  Ah-Hwee Tan,et al.  FALCON: a fusion architecture for learning, cognition, and navigation , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[19]  Marc Parizeau,et al.  Analysis of a master-slave architecture for distributed evolutionary computations , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Peter Stone,et al.  Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.

[21]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[22]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[23]  Sandip Sen,et al.  Individual learning of coordination knowledge , 1998, J. Exp. Theor. Artif. Intell..

[24]  Andres Perez-Uribe,et al.  Structure-Adaptable Digital Neural Networks , 1999 .

[25]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[26]  Andrés Pérez Uribe,et al.  Structure-Adaptable Digital Neural Networks , 1999 .

[27]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[28]  K. Sycara,et al.  Learning Approach for Planning and Scheduling in Multi-Agent Domain , 2000 .

[29]  George D. Magoulas,et al.  A New Learning Rates Adaptation Strategy for the Resilient Propagation Algorithm , 2004, ESANN.

[30]  Reda Alhajj,et al.  Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[32]  Ron Sun,et al.  From implicit skills to explicit knowledge: a bottom-up model of skill learning , 2001, Cogn. Sci..

[33]  Kao-Shing Hwang,et al.  Cooperative multiagent congestion control for high-speed networks , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[34]  Ah-Hwee Tan,et al.  Self-organizing cognitive agents and reinforcement learning in multi-agent environment , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[35]  D. Gordon A Cognitive Model of Learning to Navigate , 1997 .

[36]  B. Moore,et al.  ART1 and pattern clustering , 1989 .

[37]  Sandip Sen,et al.  Evaluating concurrent reinforcement learners , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[38]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[39]  M. Matarić Learning to Behave Socially , 1994 .

[40]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[41]  Tucker Balch,et al.  Reward and Diversity in Multirobot Foraging , 1999, IJCAI 1999.

[42]  H. L. Akin,et al.  Q-Learning based Market-Driven Multi-Agent Collaboration in Robot Soccer , 2004 .

[43]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[44]  Tucker Balch,et al.  Learning Roles: Behavioral Diversity in Robot Teams , 1997 .

[45]  Kenji Fukumoto,et al.  Multi-agent Reinforcement Learning: A Modular Approach , 1996 .

[46]  Maja J. Matarić,et al.  Leaning to behave socially , 1994 .

[47]  Matthias Fuchs,et al.  Experiments in learning prototypical situations for variants of the pursuit game , 1999 .

[48]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50]  Sandip Sen,et al.  Towards a pareto-optimal solution in general-sum games , 2003, AAMAS '03.

[51]  Martin A. Riedmiller,et al.  RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[52]  Çetin Meriçli,et al.  Market-Driven Multi-Agent Collaboration in Robot Soccer Domain , 2005 .

[53]  Stephen Grossberg,et al.  Art 2: Self-Organization Of Stable Category Recognition Codes For Analog Input Patterns , 1988, Other Conferences.