Coordination of independent learners in cooperative Markov games.

In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficulties encountered by these approaches, we focus on two main skills: optimistic agents, which manage the coordination in deterministic environments, and the detection of the stochasticity of a game. Indeed, the key difficulty in stochastic environment is to distinguish between various causes of noise. The SOoN algorithm is so introduced, standing for “Swing between Optimistic or Neutral”, in which independent learners can adapt automatically to the environment stochasticity. Empirical results on various cooperative Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of other agents.

[1]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[2]  Alex Fukunaga,et al.  Cooperative mobile robotics: antecedents and directions , 1995 .

[3]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[4]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[5]  Sean Luke,et al.  Lenient learners in cooperative multiagent systems , 2006, AAMAS '06.

[6]  Bikramjit Banerjee,et al.  Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.

[7]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[8]  Kagan Tumer,et al.  Distributed agent-based air traffic flow management , 2007, AAMAS '07.

[9]  Richard Alterman,et al.  Autonomous Agents that Learn to Better Coordinate , 2004, Autonomous Agents and Multi-Agent Systems.

[10]  John N. Tsitsiklis,et al.  On the Complexity of Designing Distributed Protocols , 1982, Inf. Control..

[11]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[12]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[13]  Gang Chen,et al.  Coordinating Multiple Agents via Reinforcement Learning , 2005, Autonomous Agents and Multi-Agent Systems.

[14]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed AI , 2003 .

[15]  H. Fujita,et al.  Design, fabrication, and control of MEMS-based actuator arrays for air-flow distributed micromanipulation , 2006, Journal of Microelectromechanical Systems.

[16]  Ying Wang,et al.  Multi-robot Box-pushing: Single-Agent Q-Learning vs. Team Q-Learning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  D. Kudenko,et al.  Improving on the reinforcement learning of coordination in cooperative multi-agent systems , 2002 .

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[20]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[21]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[23]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[24]  Dan Ventura,et al.  Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[25]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[26]  Sandip Sen,et al.  Individual learning of coordination knowledge , 1998, J. Exp. Theor. Artif. Intell..

[27]  Daniel Kudenko,et al.  Reinforcement Learning of Coordination in Heterogeneous Cooperative Multi-agent Systems , 2005, Adaptive Agents and Multi-Agent Systems.

[28]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[29]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[30]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[31]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[32]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[33]  R. Bellman A Markovian Decision Process , 1957 .

[34]  Paul Bourgine,et al.  Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.

[35]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[36]  Bart De Schutter,et al.  Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[37]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[38]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Bart De Schutter,et al.  Multi-Agent Reinforcement Learning: A Survey , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[40]  Kagan Tumer,et al.  Reinforcement Learning in Distributed Domains: Beyond Team Games , 2001, IJCAI.

[41]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[42]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[43]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .

[44]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[45]  Ann Nowé,et al.  Exploring selfish reinforcement learning in repeated games with stochastic rewards , 2007, Autonomous Agents and Multi-Agent Systems.

[46]  Sean Luke,et al.  Can good learners always compensate for poor learners? , 2006, AAMAS '06.

[47]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.