A multi-agent system integrating reinforcement learning, bidding and genetic algorithms

This paper presents a multi-agent reinforcement learning bidding approach (MARLBS) for performing multi-agent reinforcement learning. MARLBS integrates reinforcement learning, bidding and genetic algorithms. The general idea of our multi-agent systems is as follows: There are a number of individual agents in a team, each agent of the team has two modules: Q module and CQ module. Each agent can select actions to be performed at each step, which are done by the Q module. While the CQ module determines at each step whether the agent should continue or relinquish control. Once an agent relinquishes its control, a new agent is selected by bidding algorithms. We applied GA-based MARLBS to the Backgammon game. The experimental results show MARLBS can achieve a superior level of performance in game-playing, outperforming PubEval, while the system uses zero built-in knowledge.

[1]  Terrence J. Sejnowski,et al.  A Parallel Network that Learns to Play Backgammon , 1989, Artif. Intell..

[2]  Bruno Bouzy,et al.  Computer Go: An AI oriented survey , 2001, Artif. Intell..

[3]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[4]  Jordan B. Pollack,et al.  Co-Evolution in the Successful Learning of Backgammon Strategy , 1998, Machine Learning.

[5]  Charles E. Taylor Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Complex Adaptive Systems.John H. Holland , 1994 .

[6]  Scott Sanner,et al.  Achieving Efficient and Cognitively Plausible Learning in Backgammon , 2000, ICML.

[7]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[8]  Ron Sun,et al.  Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[9]  Marco Wiering,et al.  HQ-Learning: Discovering Markovian Subgoals for Non-Markovian Reinforcement Learning , 1996 .

[10]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[11]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[12]  Paul J. Darwen,et al.  Why co-evolution beats temporal difference learning at Backgammon for a linear architecture, but not a non-linear architecture , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  E. B. Baum,et al.  Manifesto for an evolutionary economics of intelligence , 1998 .

[14]  Michael Luck,et al.  Cooperation Structures , 1997, IJCAI.

[15]  Jürgen Schmidhuber,et al.  Learning Team Strategies: Soccer Case Studies , 1998, Machine Learning.

[16]  David B. Fogel,et al.  Evolving a checkers player without relying on human experience , 2000, INTL.

[17]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[18]  C. Lebiere,et al.  The Atomic Components of Thought , 1998 .

[19]  Vasant Honavar,et al.  From Evolving a Single Neural Network to Evolving Neural Network Ensembles , 2001 .

[20]  Steven P. Ketchpel Forming Coalitions in the Face of Uncertain Rewards , 1994, AAAI.

[21]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[22]  Victor R. Lesser,et al.  Coalitions Among Computationally Bounded Agents , 1997, Artif. Intell..

[23]  Pattie Maes,et al.  Agents that buy and sell , 1999, CACM.

[24]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[25]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[26]  Ron Sun,et al.  Bidding in reinforcement learning: a paradigm for multi-agent systems , 1999, AGENTS '99.