论文信息 - The world of independent learners is not markovian

The world of independent learners is not markovian

In multi-agent systems, the presence of learning agents can cause the environment to be non-Markovian from an agent's perspective thus violating the property that traditional single-agent learning methods rely upon. This paper formalizes some known intuition about concurrently learning agents by providing formal conditions that make the environment non-Markovian from an independent (non-communicative) learner's perspective. New concepts are introduced like the divergent learning paths and the observability of the effects of others' actions. To illustrate the formal concepts, a case study is also presented. These findings are significant because they both help to understand failures and successes of existing learning algorithms as well as being suggestive for future work.

[1] Zhang Zheng,et al. Multiagent reinforcement learning for a planetary exploration multirobot system , 2006 .

[2] Ying Wang,et al. A machine-learning approach to multi-robot coordination , 2008, Eng. Appl. Artif. Intell..

[3] Dan Ventura,et al. Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[4] Nikos A. Vlassis,et al. Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[5] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[6] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[7] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[8] Vivek S. Borkar,et al. Reinforcement Learning in Markovian Evolutionary Games , 2002, Adv. Complex Syst..

[9] Ann Nowé,et al. Exploring selfish reinforcement learning in repeated games with stochastic rewards , 2007, Autonomous Agents and Multi-Agent Systems.

[10] Guillaume J. Laurent,et al. Designing Decentralized Controllers for Distributed-Air-Jet MEMS-Based Micromanipulators by Reinforcement Learning , 2010, J. Intell. Robotic Syst..

[11] Guillaume J. Laurent,et al. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Daniel Kudenko,et al. Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[13] Gerald Tesauro,et al. Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies , 2007, IEEE Internet Computing.

[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[15] R. Bellman. A Markovian Decision Process , 1957 .

[16] Nikos Vlassis,et al. A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .

[17] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18] Guillaume J. Laurent,et al. Coordination of independent learners in cooperative Markov games. , 2009 .

[19] H. Peyton Young,et al. The Possible and the Impossible in Multi-Agent Learning , 2007, Artif. Intell..

[20] Bart De Schutter,et al. Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[21] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[22] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[23] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[24] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[25] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[26] Karl Tuyls,et al. Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[27] Jim Dowling,et al. Decentralized Reinforcement Learning for the Online Optimization of Distributed Systems , 2008 .

[28] Francisco S. Melo,et al. Convergence of Independent Adaptive Learners , 2007, EPIA Workshops.

[29] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[30] Ahmed Syed Irshad,et al. Markov Decision Process , 2011 .

[31] Erfu Yang,et al. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[32] Maarten Peeters,et al. Multi-agent Reinforcement Learning in Stochastic Single and Multi-stage Games , 2005, Adaptive Agents and Multi-Agent Systems.

[33] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[34] Alessandro Lazaric,et al. Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[35] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[36] Jeffrey S. Rosenschein,et al. Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[37] Karl Tuyls,et al. An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[38] D. Kudenko,et al. Improving on the reinforcement learning of coordination in cooperative multi-agent systems , 2002 .

[39] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[41] Martin Lauer,et al. Reinforcement learning for stochastic cooperative multi-agent-systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[42] Tuomas Sandholm,et al. Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[43] Manuela Veloso,et al. Multiagent learning in the presence of agents with limitations , 2003 .

[44] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.