The world of independent learners is not markovian

In multi-agent systems, the presence of learning agents can cause the environment to be non-Markovian from an agent's perspective thus violating the property that traditional single-agent learning methods rely upon. This paper formalizes some known intuition about concurrently learning agents by providing formal conditions that make the environment non-Markovian from an independent (non-communicative) learner's perspective. New concepts are introduced like the divergent learning paths and the observability of the effects of others' actions. To illustrate the formal concepts, a case study is also presented. These findings are significant because they both help to understand failures and successes of existing learning algorithms as well as being suggestive for future work.

[1]  Zhang Zheng,et al.  Multiagent reinforcement learning for a planetary exploration multirobot system , 2006 .

[2]  Ying Wang,et al.  A machine-learning approach to multi-robot coordination , 2008, Eng. Appl. Artif. Intell..

[3]  Dan Ventura,et al.  Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[4]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[5]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[6]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[7]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[8]  Vivek S. Borkar,et al.  Reinforcement Learning in Markovian Evolutionary Games , 2002, Adv. Complex Syst..

[9]  Ann Nowé,et al.  Exploring selfish reinforcement learning in repeated games with stochastic rewards , 2007, Autonomous Agents and Multi-Agent Systems.

[10]  Guillaume J. Laurent,et al.  Designing Decentralized Controllers for Distributed-Air-Jet MEMS-Based Micromanipulators by Reinforcement Learning , 2010, J. Intell. Robotic Syst..

[11]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[13]  Gerald Tesauro,et al.  Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies , 2007, IEEE Internet Computing.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  R. Bellman A Markovian Decision Process , 1957 .

[16]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .

[17]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Guillaume J. Laurent,et al.  Coordination of independent learners in cooperative Markov games. , 2009 .

[19]  H. Peyton Young,et al.  The Possible and the Impossible in Multi-Agent Learning , 2007, Artif. Intell..

[20]  Bart De Schutter,et al.  Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[21]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[22]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[23]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[24]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[25]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[26]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[27]  Jim Dowling,et al.  Decentralized Reinforcement Learning for the Online Optimization of Distributed Systems , 2008 .

[28]  Francisco S. Melo,et al.  Convergence of Independent Adaptive Learners , 2007, EPIA Workshops.

[29]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[30]  Ahmed Syed Irshad,et al.  Markov Decision Process , 2011 .

[31]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[32]  Maarten Peeters,et al.  Multi-agent Reinforcement Learning in Stochastic Single and Multi-stage Games , 2005, Adaptive Agents and Multi-Agent Systems.

[33]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[34]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[35]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[36]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[37]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[38]  D. Kudenko,et al.  Improving on the reinforcement learning of coordination in cooperative multi-agent systems , 2002 .

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[41]  Martin Lauer,et al.  Reinforcement learning for stochastic cooperative multi-agent-systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[42]  Tuomas Sandholm,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[43]  Manuela Veloso,et al.  Multiagent learning in the presence of agents with limitations , 2003 .

[44]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.