论文信息 - A survey and critique of multiagent deep reinforcement learning

A survey and critique of multiagent deep reinforcement learning

Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.

Matthew E. Taylor | Pablo Hernandez-Leal | Bilal Kartal | Pablo Hernandez-Leal | Bilal Kartal

[1] Kagan Tumer,et al. General principles of learning-based multi-agent systems , 1999, AGENTS '99.

[2] Stephen J. Guy,et al. Stochastic Tree Search with Useful Cycles for patrolling problems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3] Hitoshi Iba. Emergent Cooperation for Multiple Agents Using Genetic Programming , 1996, PPSN.

[4] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[5] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[6] Julian Togelius,et al. Pommerman: A Multi-Agent Playground , 2018, AIIDE Workshops.

[7] S. Levine,et al. Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games ? , 2018 .

[8] Chao Gao,et al. On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman , 2019, AIIDE.

[9] Peter Stone,et al. Multiagent learning is not the answer. It is the question , 2007, Artif. Intell..

[10] H. Francis Song,et al. Machine Theory of Mind , 2018, ICML.

[11] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[12] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[13] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[14] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.

[15] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[16] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[17] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .

[18] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[19] Yang Yu,et al. Towards Sample Efficient Reinforcement Learning , 2018, IJCAI.

[20] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.

[21] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[22] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24] Peter McCracken,et al. Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.

[25] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[26] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[27] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[28] Daniel Kudenko,et al. Learning in multi-agent systems , 2001, The Knowledge Engineering Review.

[29] OpitzDavid,et al. Popular ensemble methods , 1999 .

[30] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[31] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[32] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[33] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.

[34] Colin Camerer,et al. Behavioural Game Theory : Thinking , Learning and Teaching , 2004 .

[35] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[36] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[37] Rahul Savani,et al. Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[38] Sarit Kraus,et al. Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[39] Peter Vrancx,et al. Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[40] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[41] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[42] Jeff S. Shamma,et al. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.

[43] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[44] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[45] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[46] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[47] Katja Hofmann,et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.

[48] Olivier Simonin,et al. Cooperative Multi-agent Policy Gradient , 2018, ECML/PKDD.

[49] L. Buşoniu. Evolutionary function approximation for reinforcement learning , 2006 .

[50] R. Rosenthal. The file drawer problem and tolerance for null results , 1979 .

[51] Michael L. Littman,et al. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[52] Wei Zhao,et al. Deep Reinforcement Learning for Sponsored Search Real-time Bidding , 2018, KDD.

[53] Zachary Chase Lipton,et al. Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, 1611.01211.

[54] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[55] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[56] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[57] Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[58] D. Sculley,et al. Winner's Curse? On Pace, Progress, and Empirical Rigor , 2018, ICLR.

[59] Hao Liu,et al. Action-dependent Control Variates for Policy Optimization via Stein Identity , 2018, ICLR.

[60] John C. Harsanyi,et al. Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[61] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[62] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[63] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.

[64] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[65] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[66] Yan Zheng,et al. Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments , 2018, PRICAI.

[67] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[68] Carla P. Gomes. On the intersection of AI and OR , 2001, Knowl. Eng. Rev..

[69] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[70] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[71] Maria L. Gini,et al. Safely Using Predictions in General-Sum Normal Form Games , 2017, AAMAS.

[72] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[73] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[74] Paul Erdös,et al. On a Combinatorial Game , 1973, J. Comb. Theory A.

[75] Maruan Al-Shedivat,et al. Learning Policy Representations in Multiagent Systems , 2018, ICML.

[76] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[77] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.

[78] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[79] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[80] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[81] Matthew E. Taylor,et al. Identifying and Tracking Switching, Non-Stationary Opponents: A Bayesian Approach , 2016, AAAI Workshop: Multiagent Interaction without Prior Coordination.

[82] Jianfeng Gao,et al. Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, ArXiv.

[83] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[84] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[85] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[86] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[87] R. Bellman. A Markovian Decision Process , 1957 .

[88] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[89] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[90] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[91] Shauharda Khadka,et al. Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination , 2019, ICML.

[92] Maria L. Gini,et al. Monte Carlo Tree Search with Branch and Bound for Multi-Robot Task Allocation , 2016 .

[93] Marco Wiering,et al. Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[94] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[95] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[96] Shimon Whiteson,et al. The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement Learning , 2019, AAMAS.

[97] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[98] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[99] Y. Mansour,et al. 4 Learning , Regret minimization , and Equilibria , 2006 .

[100] Julian Togelius,et al. Playing Atari with Six Neurons , 2018, AAMAS.

[101] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[102] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[103] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[104] Larry D. Pyeatt,et al. Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[105] Nikos A. Vlassis,et al. Sparse cooperative Q-learning , 2004, ICML.

[106] Joel H. Spencer,et al. Randomization, Derandomization and Antirandomization: Three Games , 1994, Theor. Comput. Sci..

[107] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.

[108] Zachary C. Lipton,et al. Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[109] Spyridon Samothrakis,et al. On Monte Carlo Tree Search and Reinforcement Learning , 2017, J. Artif. Intell. Res..

[110] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[111] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.

[112] Pablo Hernandez-Leal,et al. Learning against sequential opponents in repeated stochastic games , 2017 .

[113] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[114] Frans A. Oliehoek,et al. Coordinated Deep Reinforcement Learners for Traffic Light Control , 2016 .

[115] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[116] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[117] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[118] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[119] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[120] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[121] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[122] J. Neumann,et al. Theory of Games and Economic Behavior. , 1945 .

[123] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[124] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.

[125] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[126] Giovanni Montana,et al. Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication , 2019, Mach. Learn..

[127] Frans A. Oliehoek,et al. Interactive Learning and Decision Making: Foundations, Insights & Challenges , 2018, IJCAI.

[128] G. Tesauro,et al. Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[129] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.

[130] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[131] Shimon Whiteson,et al. OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.

[132] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[133] Kagan Tumer,et al. Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[135] Alessandro Lazaric,et al. Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[136] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[137] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[138] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[139] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[140] Julian Togelius,et al. Deep Reinforcement Learning for General Video Game AI , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[141] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[142] M. Dufwenberg. Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[143] Dan Ventura,et al. Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[144] Simon M. Lucas,et al. Coevolving Game-Playing Agents: Measuring Performance and Intransitivities , 2013, IEEE Transactions on Evolutionary Computation.

[145] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[146] Pablo Hernandez-Leal,et al. Towards a Fast Detection of Opponents in Repeated Stochastic Games , 2017, AAMAS Workshops.

[147] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[148] Yan Zheng,et al. A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents , 2018, NeurIPS.

[149] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[150] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[151] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[152] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[153] Chao Gao,et al. Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition , 2019, ArXiv.

[154] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .

[155] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[156] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[157] David Isele,et al. Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[158] S. C. Suddarth,et al. Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[159] Michael H. Bowling,et al. Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.

[160] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[161] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[162] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.

[163] Shane Legg,et al. Modeling Friends and Foes , 2018, ArXiv.

[164] Peter Vrancx,et al. Learning multi-agent state space representations , 2010, AAMAS.

[165] N. Le Fort-Piat,et al. The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[166] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[167] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[168] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[169] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[170] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[171] Bhiksha Raj,et al. On the Origin of Deep Learning , 2017, ArXiv.

[172] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[173] Sean Luke,et al. Lenient Learning in Independent-Learner Stochastic Cooperative Games , 2016, J. Mach. Learn. Res..

[174] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[175] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[176] Daniele Molinari,et al. Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning , 2019, AAMAS.

[177] Frans A. Oliehoek,et al. Scalable Planning and Learning for Multiagent POMDPs , 2014, AAAI.

[178] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[179] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[180] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.

[181] Chongjie Zhang,et al. Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games , 2019, AAMAS.

[182] A. Elo. The rating of chessplayers, past and present , 1978 .

[183] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[184] Benjamin Rosman,et al. Bayesian policy reuse , 2015, Machine Learning.

[185] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.

[186] Peter Secretan. Learning , 1965, Mental Health.

[187] Peter Vrancx,et al. Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets , 2017, AAAI.

[188] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[189] Satinder Singh. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[190] Karl Tuyls,et al. FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes , 2011, Interactive Decision Theory and Game Theory.

[191] Leslie Pack Kaelbling,et al. Influence-Based Abstraction for Multiagent Systems , 2012, AAAI.

[192] Heikki Huttunen,et al. HARK Side of Deep Learning - From Grad Student Descent to Automated Machine Learning , 2019, ArXiv.

[193] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[194] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[195] Yoav Shoham,et al. Learning against opponents with bounded memory , 2005, IJCAI.

[196] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[197] Yusen Zhan,et al. Efficiently detecting switches against non-stationary opponents , 2017, Autonomous Agents and Multi-Agent Systems.

[198] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[199] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[200] Bart Verheij,et al. How much does it help to know what she knows you know? An agent-based simulation study , 2013, Artif. Intell..

[201] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[202] Saeid Nahavandi,et al. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[203] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[204] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[205] Yan Zheng,et al. Towards Efficient Detection and Optimal Response against Sophisticated Opponents , 2018, IJCAI.

[206] W. Hamilton,et al. The Evolution of Cooperation , 1984 .

[207] Karl Tuyls,et al. Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[208] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[209] Shimon Whiteson,et al. Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[210] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[211] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[212] P. J. Gmytrasiewicz,et al. A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[213] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[214] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[215] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[216] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[217] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[218] Yoav Shoham,et al. A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.

[219] Razvan Pascanu,et al. Adapting Auxiliary Losses Using Gradient Similarity , 2018, ArXiv.

[220] Kevin Waugh,et al. Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[221] Sarit Kraus,et al. Teamwork with Limited Knowledge of Teammates , 2013, AAAI.

[222] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[223] 유재철,et al. Randomization , 2020, Randomization, Bootstrap and Monte Carlo Methods in Biology.

[224] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[225] Karl Tuyls,et al. alpha-Rank: Multi-Agent Evaluation by Evolution , 2019 .

[226] Sam Devlin,et al. Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[227] L. Shapley,et al. Fictitious Play Property for Games with Identical Interests , 1996 .

[228] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[229] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[230] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[231] Miguel A. Costa-Gomes,et al. Cognition and Behavior in Normal-Form Games: An Experimental Study , 1998 .

[232] David Carmel,et al. Incorporating Opponent Models into Adversary Search , 1996, AAAI/IAAI, Vol. 1.

[233] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[234] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[235] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[236] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[237] Peter Stone,et al. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning , 2018, ArXiv.

[238] Sayan Mukherjee,et al. Bayesian group factor analysis with structured sparsity , 2016, J. Mach. Learn. Res..

[239] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[240] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[241] V. Borkar. Asynchronous Stochastic Approximations , 1998 .

[242] Pieter Abbeel,et al. Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[243] Larry Bull,et al. Evolutionary Computing in Multi-agent Environments: Operators , 1998, Evolutionary Programming.

[244] Michael A. Goodrich,et al. Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining , 2003, ICML.

[245] Robert Babuska,et al. Experience Selection in Deep Reinforcement Learning for Control , 2018, J. Mach. Learn. Res..

[246] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[247] Rich Caruana,et al. Model compression , 2006, KDD '06.

[248] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[249] Dengyong Zhou,et al. Action-depedent Control Variates for Policy Optimization via Stein's Identity , 2017 .

[250] Sean Luke,et al. Lenience towards Teammates Helps in Cooperative Multiagent Learning , 2005 .

[251] Kagan Tumer,et al. Distributed agent-based air traffic flow management , 2007, AAMAS '07.

[252] Matthew E. Taylor,et al. Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL , 2018, ArXiv.

[253] Colin Camerer,et al. A Cognitive Hierarchy Model of Games , 2004 .

[254] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.

[255] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[256] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[257] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[258] Youngchul Sung,et al. Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning , 2019, AAAI.

[259] Joel Z. Leibo,et al. Malthusian Reinforcement Learning , 2018, AAMAS.

[260] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[261] Joelle Pineau,et al. On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[262] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[263] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[264] Olivier Simonin,et al. Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[265] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[266] Felipe Leno da Silva,et al. A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[267] Colin Camerer,et al. Behavioral Game Theory: Thinking, Learning and Teaching , 2001 .

[268] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[269] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[270] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..

[271] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[272] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[273] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[274] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[275] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[276] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[277] Tsuyoshi Murata,et al. {m , 1934, ACML.

[278] M. Stanković. Multi-agent reinforcement learning , 2016 .

[279] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[280] Marwan Mattar,et al. Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[281] Peter Stone,et al. Multiagent learning in the presence of memory-bounded agents , 2013, Autonomous Agents and Multi-Agent Systems.

[282] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.

[283] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[284] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[285] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[286] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[287] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[288] Subramanian Ramamoorthy,et al. A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems , 2013, AAMAS.

[289] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[290] K. Tuyls,et al. Lenient Frequency Adjusted Q-learning , 2010 .

[291] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[292] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[293] Edwin D. de Jong,et al. The parallel Nash Memory for asymmetric games , 2006, GECCO.

[294] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[295] J. Harsanyi. Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .

[296] Christos H. Papadimitriou,et al. α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.

[297] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[298] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[299] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[300] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[301] Sam Devlin,et al. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition , 2019, ArXiv.

[302] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.

[303] Kamyar Azizzadenesheli. Maybe a few considerations in Reinforcement Learning Research , 2019 .

[304] Matthew E. Taylor,et al. Agent Modeling as Auxiliary Task for Deep Reinforcement Learning , 2019, AIIDE.

[305] Drew Wicke,et al. Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[306] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[307] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[308] Adnan Darwiche,et al. Human-level intelligence or animal-like abilities? , 2017, Commun. ACM.

[309] Takashi Kamihigashi,et al. Necessary and Sufficient Conditions for a Solution of the Bellman Equation to be the Value Function: A General Principle , 2015 .

[310] Larry Bull,et al. Evolution in Multi-agent Systems: Evolving Communicating Classifier Systems for Gait in a Quadrupedal Robot , 1995, ICGA.

[311] Goran Strbac,et al. Recurrent Deep Multiagent Q-Learning for Autonomous Brokers in Smart Grid , 2018, IJCAI.

[312] Lianlong Wu,et al. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence , 2019, AAAI.

[313] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[314] Richard K. Belew,et al. New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[315] Trevor Darrell,et al. Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[316] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[317] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[318] Rob Fergus,et al. Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.

[319] Bikramjit Banerjee,et al. Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.

[320] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[321] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[322] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[323] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[324] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[325] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[326] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[327] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[328] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[329] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[330] Michael A. Goodrich,et al. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[331] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[332] Zhang-Wei Hong,et al. A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.

[333] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[334] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[335] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[336] J. Zico Kolter,et al. What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.

[337] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[338] Filippos Christianos,et al. Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[339] Rahul Savani,et al. Negative Update Intervals in Deep Multi-Agent Reinforcement Learning , 2018, AAMAS.

[340] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[341] Murray L Weidenbaum,et al. Learning to compete , 1986 .

[342] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[343] Rémi Munos,et al. Neural Replicator Dynamics , 2019, ArXiv.

[344] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.

[345] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[346] Kenneth O. Stanley,et al. Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[347] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[348] Michela Paganini,et al. The Scientific Method in the Science of Machine Learning , 2019, ArXiv.

[349] Jeffrey S. Rosenschein,et al. Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[350] Joshua B. Tenenbaum,et al. Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning , 2017, ArXiv.

[351] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[352] Gerald Tesauro,et al. Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[353] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[354] Thomas Hofmann,et al. TrueSkill™: A Bayesian Skill Rating System , 2007 .

[355] Kagan Tumer,et al. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[356] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[357] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.

[358] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[359] Weitang Liu,et al. Surprising Negative Results for Generative Adversarial Tree Search , 2018, 1806.05780.

[360] Michael H. Bowling,et al. Coordination and Adaptation in Impromptu Teams , 2005, AAAI.

[361] Edmund H. Durfee,et al. Rational Coordination in Multi-Agent Environments , 2000, Autonomous Agents and Multi-Agent Systems.