论文信息 - A selection-mutation model for q-learning in multi-agent systems

A selection-mutation model for q-learning in multi-agent systems

Although well understood in the single-agent framework, the use of traditional reinforcement learning (RL) algorithms in multi-agent systems (MAS) is not always justified. The feedback an agent experiences in a MAS, is usually influenced by the other agents present in the system. Multi agent environments are therefore non-stationary and convergence and optimality guarantees of RL algorithms are lost. To better understand the dynamics of traditional RL algorithms we analyze the learning process in terms of evolutionary dynamics. More specifically we show how the Replicator Dynamics (RD) can be used as a model for Q-learning in games. The dynamical equations of Q-learning are derived and illustrated by some well chosen experiments. Both reveal an interesting connection between the exploitation-exploration scheme from RL and the selection-mutation mechanisms from evolutionary game theory.

[1] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .

[2] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[3] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .

[4] L. Samuelson. Evolutionary Games and Equilibrium Selection , 1997 .

[5] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .

[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[7] D. Stauffer. Life, Love and Death: Models of Biological Reproduction and Aging , 1999 .

[8] T. D. Schneider,et al. Evolution of biological information. , 2000, Nucleic acids research.

[9] Fernando Redondo. Game Theory and Economics , 2001 .

[10] Tom Lenaerts,et al. Towards a relation between learning agents and evolutionary dynamics , 2002 .

[11] D. Serra,et al. Game theory and economics , 2003 .

[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.