Apprentissage de la coordination multiagent. Une méthode basée sur le Q-learning par jeu adaptatif

Current algorithmes on multiagent learning are for almost limited since they cannot manage the multiplicity of Nash equilibria and thus converge to the Pareto-optimal. To alleviate this, we propose here a learning mechanism extending the Q-learning to non-cooperative stochastique games. This learning mechanism converges to Pareto-optimal equilibria in self-play. We present experimental results showing convergence of such learning mechanism. We then extend our approach to the case of non-stationarity of agents which is another important aspect of multiagent systems. Finally, we tackle the question of non stationarity in multiagent environments in its generality and we present in this context some research avenues which can lead to improve our preliminary results on adaptation.

[1]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[2]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Denyse Baillargeon,et al.  Bibliographie , 1929 .

[7]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[8]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[9]  Tom Fawcett,et al.  Proceedings, Twentieth International Conference on Machine Learning , 2003 .

[10]  Edmund H. Durfee,et al.  Predicting the Expected Behavior of Agents that Learn About Agents: The CLRI Framework , 2004, Autonomous Agents and Multi-Agent Systems.

[11]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[12]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[13]  Bikramjit Banerjee,et al.  Countering Deception in Multiagent Reinforcement Learning , 2003 .

[14]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[15]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[16]  H. Young,et al.  Individual Strategy and Social Structure: An Evolutionary Theory of Institutions , 1999 .

[17]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[18]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[19]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.