论文信息 - A multi-agent system integrating reinforcement learning, bidding and genetic algorithms - 字舞流文

A multi-agent system integrating reinforcement learning, bidding and genetic algorithms

This paper presents a multi-agent reinforcement learning bidding approach (MARLBS) for performing multi-agent reinforcement learning. MARLBS integrates reinforcement learning, bidding and genetic algorithms. The general idea of our multi-agent systems is as follows: There are a number of individual agents in a team, each agent of the team has two modules: Q module and CQ module. Each agent can select actions to be performed at each step, which are done by the Q module. While the CQ module determines at each step whether the agent should continue or relinquish control. Once an agent relinquishes its control, a new agent is selected by bidding algorithms. We applied GA-based MARLBS to the Backgammon game. The experimental results show MARLBS can achieve a superior level of performance in game-playing, outperforming PubEval, while the system uses zero built-in knowledge.

Ron Sun | Dehu Qi | R. Sun | Dehu Qi

[1] Terrence J. Sejnowski,et al. A Parallel Network that Learns to Play Backgammon , 1989, Artif. Intell..

[2] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..

[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[4] Jordan B. Pollack,et al. Co-Evolution in the Successful Learning of Backgammon Strategy , 1998, Machine Learning.

[5] Charles E. Taylor. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Complex Adaptive Systems.John H. Holland , 1994 .

[6] Scott Sanner,et al. Achieving Efficient and Cognitively Plausible Learning in Backgammon , 2000, ICML.

[7] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[8] Ron Sun,et al. Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[9] Marco Wiering,et al. HQ-Learning: Discovering Markovian Subgoals for Non-Markovian Reinforcement Learning , 1996 .

[10] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[11] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[12] Paul J. Darwen,et al. Why co-evolution beats temporal difference learning at Backgammon for a linear architecture, but not a non-linear architecture , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13] E. B. Baum,et al. Manifesto for an evolutionary economics of intelligence , 1998 .

[14] Michael Luck,et al. Cooperation Structures , 1997, IJCAI.

[15] Jürgen Schmidhuber,et al. Learning Team Strategies: Soccer Case Studies , 1998, Machine Learning.

[16] David B. Fogel,et al. Evolving a checkers player without relying on human experience , 2000, INTL.

[17] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[18] C. Lebiere,et al. The Atomic Components of Thought , 1998 .

[19] Vasant Honavar,et al. From Evolving a Single Neural Network to Evolving Neural Network Ensembles , 2001 .

[20] Steven P. Ketchpel. Forming Coalitions in the Face of Uncertain Rewards , 1994, AAAI.

[21] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[22] Victor R. Lesser,et al. Coalitions Among Computationally Bounded Agents , 1997, Artif. Intell..

[23] Pattie Maes,et al. Agents that buy and sell , 1999, CACM.

[24] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .

[25] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[26] Ron Sun,et al. Bidding in reinforcement learning: a paradigm for multi-agent systems , 1999, AGENTS '99.