A survey and critique of multiagent deep reinforcement learning
暂无分享,去创建一个
[1] Kagan Tumer,et al. General principles of learning-based multi-agent systems , 1999, AGENTS '99.
[2] Stephen J. Guy,et al. Stochastic Tree Search with Useful Cycles for patrolling problems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[3] Hitoshi Iba. Emergent Cooperation for Multiple Agents Using Genetic Programming , 1996, PPSN.
[4] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[5] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..
[6] Julian Togelius,et al. Pommerman: A Multi-Agent Playground , 2018, AIIDE Workshops.
[7] S. Levine,et al. Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games ? , 2018 .
[8] Chao Gao,et al. On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman , 2019, AIIDE.
[9] Peter Stone,et al. Multiagent learning is not the answer. It is the question , 2007, Artif. Intell..
[10] H. Francis Song,et al. Machine Theory of Mind , 2018, ICML.
[11] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[12] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[13] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[14] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[15] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[16] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[17] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .
[18] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[19] Yang Yu,et al. Towards Sample Efficient Reinforcement Learning , 2018, IJCAI.
[20] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[21] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[22] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Peter McCracken,et al. Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.
[25] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[26] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[27] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[28] Daniel Kudenko,et al. Learning in multi-agent systems , 2001, The Knowledge Engineering Review.
[29] OpitzDavid,et al. Popular ensemble methods , 1999 .
[30] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.
[31] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[32] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[33] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.
[34] Colin Camerer,et al. Behavioural Game Theory : Thinking , Learning and Teaching , 2004 .
[35] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[36] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.
[37] Rahul Savani,et al. Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.
[38] Sarit Kraus,et al. Collaborative Plans for Complex Group Action , 1996, Artif. Intell..
[39] Peter Vrancx,et al. Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.
[40] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.
[41] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[42] Jeff S. Shamma,et al. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.
[43] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.
[44] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.
[45] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[46] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.
[47] Katja Hofmann,et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.
[48] Olivier Simonin,et al. Cooperative Multi-agent Policy Gradient , 2018, ECML/PKDD.
[49] L. Buşoniu. Evolutionary function approximation for reinforcement learning , 2006 .
[50] R. Rosenthal. The file drawer problem and tolerance for null results , 1979 .
[51] Michael L. Littman,et al. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.
[52] Wei Zhao,et al. Deep Reinforcement Learning for Sponsored Search Real-time Bidding , 2018, KDD.
[53] Zachary Chase Lipton,et al. Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, 1611.01211.
[54] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[55] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).
[56] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.
[57] Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.
[58] D. Sculley,et al. Winner's Curse? On Pace, Progress, and Empirical Rigor , 2018, ICLR.
[59] Hao Liu,et al. Action-dependent Control Variates for Policy Optimization via Stein Identity , 2018, ICLR.
[60] John C. Harsanyi,et al. Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..
[61] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[62] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[63] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.
[64] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.
[65] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[66] Yan Zheng,et al. Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments , 2018, PRICAI.
[67] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[68] Carla P. Gomes. On the intersection of AI and OR , 2001, Knowl. Eng. Rev..
[69] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[70] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[71] Maria L. Gini,et al. Safely Using Predictions in General-Sum Normal Form Games , 2017, AAMAS.
[72] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[73] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[74] Paul Erdös,et al. On a Combinatorial Game , 1973, J. Comb. Theory A.
[75] Maruan Al-Shedivat,et al. Learning Policy Representations in Multiagent Systems , 2018, ICML.
[76] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.
[77] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.
[78] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[79] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[80] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[81] Matthew E. Taylor,et al. Identifying and Tracking Switching, Non-Stationary Opponents: A Bayesian Approach , 2016, AAAI Workshop: Multiagent Interaction without Prior Coordination.
[82] Jianfeng Gao,et al. Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, ArXiv.
[83] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[84] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[85] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[86] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.
[87] R. Bellman. A Markovian Decision Process , 1957 .
[88] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.
[89] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[90] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..
[91] Shauharda Khadka,et al. Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination , 2019, ICML.
[92] Maria L. Gini,et al. Monte Carlo Tree Search with Branch and Bound for Multi-Robot Task Allocation , 2016 .
[93] Marco Wiering,et al. Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.
[94] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[95] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[96] Shimon Whiteson,et al. The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement Learning , 2019, AAMAS.
[97] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[98] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[99] Y. Mansour,et al. 4 Learning , Regret minimization , and Equilibria , 2006 .
[100] Julian Togelius,et al. Playing Atari with Six Neurons , 2018, AAMAS.
[101] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.
[102] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[103] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[104] Larry D. Pyeatt,et al. Decision Tree Function Approximation in Reinforcement Learning , 1999 .
[105] Nikos A. Vlassis,et al. Sparse cooperative Q-learning , 2004, ICML.
[106] Joel H. Spencer,et al. Randomization, Derandomization and Antirandomization: Three Games , 1994, Theor. Comput. Sci..
[107] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.
[108] Zachary C. Lipton,et al. Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.
[109] Spyridon Samothrakis,et al. On Monte Carlo Tree Search and Reinforcement Learning , 2017, J. Artif. Intell. Res..
[110] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[111] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[112] Pablo Hernandez-Leal,et al. Learning against sequential opponents in repeated stochastic games , 2017 .
[113] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[114] Frans A. Oliehoek,et al. Coordinated Deep Reinforcement Learners for Traffic Light Control , 2016 .
[115] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[116] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[117] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.
[118] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[119] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[120] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[121] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.
[122] J. Neumann,et al. Theory of Games and Economic Behavior. , 1945 .
[123] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[124] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[125] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[126] Giovanni Montana,et al. Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication , 2019, Mach. Learn..
[127] Frans A. Oliehoek,et al. Interactive Learning and Decision Making: Foundations, Insights & Challenges , 2018, IJCAI.
[128] G. Tesauro,et al. Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .
[129] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.
[130] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .
[131] Shimon Whiteson,et al. OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.
[132] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[133] Kagan Tumer,et al. Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[135] Alessandro Lazaric,et al. Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.
[136] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[137] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[138] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.
[139] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[140] Julian Togelius,et al. Deep Reinforcement Learning for General Video Game AI , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).
[141] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[142] M. Dufwenberg. Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.
[143] Dan Ventura,et al. Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.
[144] Simon M. Lucas,et al. Coevolving Game-Playing Agents: Measuring Performance and Intransitivities , 2013, IEEE Transactions on Evolutionary Computation.
[145] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[146] Pablo Hernandez-Leal,et al. Towards a Fast Detection of Opponents in Repeated Stochastic Games , 2017, AAMAS Workshops.
[147] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[148] Yan Zheng,et al. A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents , 2018, NeurIPS.
[149] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[150] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[151] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.
[152] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[153] Chao Gao,et al. Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition , 2019, ArXiv.
[154] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[155] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[156] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[157] David Isele,et al. Selective Experience Replay for Lifelong Learning , 2018, AAAI.
[158] S. C. Suddarth,et al. Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.
[159] Michael H. Bowling,et al. Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.
[160] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[161] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[162] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[163] Shane Legg,et al. Modeling Friends and Foes , 2018, ArXiv.
[164] Peter Vrancx,et al. Learning multi-agent state space representations , 2010, AAMAS.
[165] N. Le Fort-Piat,et al. The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..
[166] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .
[167] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.
[168] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.
[169] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.
[170] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.
[171] Bhiksha Raj,et al. On the Origin of Deep Learning , 2017, ArXiv.
[172] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.
[173] Sean Luke,et al. Lenient Learning in Independent-Learner Stochastic Cooperative Games , 2016, J. Mach. Learn. Res..
[174] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[175] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..
[176] Daniele Molinari,et al. Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning , 2019, AAMAS.
[177] Frans A. Oliehoek,et al. Scalable Planning and Learning for Multiagent POMDPs , 2014, AAAI.
[178] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..
[179] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[180] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.
[181] Chongjie Zhang,et al. Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games , 2019, AAMAS.
[182] A. Elo. The rating of chessplayers, past and present , 1978 .
[183] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[184] Benjamin Rosman,et al. Bayesian policy reuse , 2015, Machine Learning.
[185] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[186] Peter Secretan. Learning , 1965, Mental Health.
[187] Peter Vrancx,et al. Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets , 2017, AAAI.
[188] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[189] Satinder Singh. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..
[190] Karl Tuyls,et al. FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes , 2011, Interactive Decision Theory and Game Theory.
[191] Leslie Pack Kaelbling,et al. Influence-Based Abstraction for Multiagent Systems , 2012, AAAI.
[192] Heikki Huttunen,et al. HARK Side of Deep Learning - From Grad Student Descent to Automated Machine Learning , 2019, ArXiv.
[193] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..
[194] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[195] Yoav Shoham,et al. Learning against opponents with bounded memory , 2005, IJCAI.
[196] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[197] Yusen Zhan,et al. Efficiently detecting switches against non-stationary opponents , 2017, Autonomous Agents and Multi-Agent Systems.
[198] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[199] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[200] Bart Verheij,et al. How much does it help to know what she knows you know? An agent-based simulation study , 2013, Artif. Intell..
[201] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[202] Saeid Nahavandi,et al. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.
[203] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[204] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[205] Yan Zheng,et al. Towards Efficient Detection and Optimal Response against Sophisticated Opponents , 2018, IJCAI.
[206] W. Hamilton,et al. The Evolution of Cooperation , 1984 .
[207] Karl Tuyls,et al. Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..
[208] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[209] Shimon Whiteson,et al. Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[210] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[211] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[212] P. J. Gmytrasiewicz,et al. A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.
[213] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[214] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[215] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[216] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[217] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[218] Yoav Shoham,et al. A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.
[219] Razvan Pascanu,et al. Adapting Auxiliary Losses Using Gradient Similarity , 2018, ArXiv.
[220] Kevin Waugh,et al. Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.
[221] Sarit Kraus,et al. Teamwork with Limited Knowledge of Teammates , 2013, AAAI.
[222] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[223] 유재철,et al. Randomization , 2020, Randomization, Bootstrap and Monte Carlo Methods in Biology.
[224] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.
[225] Karl Tuyls,et al. alpha-Rank: Multi-Agent Evaluation by Evolution , 2019 .
[226] Sam Devlin,et al. Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.
[227] L. Shapley,et al. Fictitious Play Property for Games with Identical Interests , 1996 .
[228] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[229] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[230] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[231] Miguel A. Costa-Gomes,et al. Cognition and Behavior in Normal-Form Games: An Experimental Study , 1998 .
[232] David Carmel,et al. Incorporating Opponent Models into Adversary Search , 1996, AAAI/IAAI, Vol. 1.
[233] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[234] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[235] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..
[236] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[237] Peter Stone,et al. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning , 2018, ArXiv.
[238] Sayan Mukherjee,et al. Bayesian group factor analysis with structured sparsity , 2016, J. Mach. Learn. Res..
[239] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.
[240] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.
[241] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[242] Pieter Abbeel,et al. Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.
[243] Larry Bull,et al. Evolutionary Computing in Multi-agent Environments: Operators , 1998, Evolutionary Programming.
[244] Michael A. Goodrich,et al. Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining , 2003, ICML.
[245] Robert Babuska,et al. Experience Selection in Deep Reinforcement Learning for Control , 2018, J. Mach. Learn. Res..
[246] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[247] Rich Caruana,et al. Model compression , 2006, KDD '06.
[248] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[249] Dengyong Zhou,et al. Action-depedent Control Variates for Policy Optimization via Stein's Identity , 2017 .
[250] Sean Luke,et al. Lenience towards Teammates Helps in Cooperative Multiagent Learning , 2005 .
[251] Kagan Tumer,et al. Distributed agent-based air traffic flow management , 2007, AAMAS '07.
[252] Matthew E. Taylor,et al. Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL , 2018, ArXiv.
[253] Colin Camerer,et al. A Cognitive Hierarchy Model of Games , 2004 .
[254] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[255] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[256] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[257] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[258] Youngchul Sung,et al. Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning , 2019, AAAI.
[259] Joel Z. Leibo,et al. Malthusian Reinforcement Learning , 2018, AAMAS.
[260] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[261] Joelle Pineau,et al. On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.
[262] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[263] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.
[264] Olivier Simonin,et al. Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).
[265] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[266] Felipe Leno da Silva,et al. A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..
[267] Colin Camerer,et al. Behavioral Game Theory: Thinking, Learning and Teaching , 2001 .
[268] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[269] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[270] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..
[271] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[272] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[273] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[274] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.
[275] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[276] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[277] Tsuyoshi Murata,et al. {m , 1934, ACML.
[278] M. Stanković. Multi-agent reinforcement learning , 2016 .
[279] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[280] Marwan Mattar,et al. Unity: A General Platform for Intelligent Agents , 2018, ArXiv.
[281] Peter Stone,et al. Multiagent learning in the presence of memory-bounded agents , 2013, Autonomous Agents and Multi-Agent Systems.
[282] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.
[283] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.
[284] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[285] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.
[286] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[287] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.
[288] Subramanian Ramamoorthy,et al. A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems , 2013, AAMAS.
[289] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[290] K. Tuyls,et al. Lenient Frequency Adjusted Q-learning , 2010 .
[291] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[292] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.
[293] Edwin D. de Jong,et al. The parallel Nash Memory for asymmetric games , 2006, GECCO.
[294] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[295] J. Harsanyi. Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .
[296] Christos H. Papadimitriou,et al. α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.
[297] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[298] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[299] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[300] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[301] Sam Devlin,et al. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition , 2019, ArXiv.
[302] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.
[303] Kamyar Azizzadenesheli. Maybe a few considerations in Reinforcement Learning Research , 2019 .
[304] Matthew E. Taylor,et al. Agent Modeling as Auxiliary Task for Deep Reinforcement Learning , 2019, AIIDE.
[305] Drew Wicke,et al. Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.
[306] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.
[307] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[308] Adnan Darwiche,et al. Human-level intelligence or animal-like abilities? , 2017, Commun. ACM.
[309] Takashi Kamihigashi,et al. Necessary and Sufficient Conditions for a Solution of the Bellman Equation to be the Value Function: A General Principle , 2015 .
[310] Larry Bull,et al. Evolution in Multi-agent Systems: Evolving Communicating Classifier Systems for Gait in a Quadrupedal Robot , 1995, ICGA.
[311] Goran Strbac,et al. Recurrent Deep Multiagent Q-Learning for Autonomous Brokers in Smart Grid , 2018, IJCAI.
[312] Lianlong Wu,et al. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence , 2019, AAAI.
[313] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[314] Richard K. Belew,et al. New Methods for Competitive Coevolution , 1997, Evolutionary Computation.
[315] Trevor Darrell,et al. Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.
[316] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[317] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.
[318] Rob Fergus,et al. Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.
[319] Bikramjit Banerjee,et al. Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.
[320] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[321] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[322] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[323] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[324] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[325] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
[326] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[327] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.
[328] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[329] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.
[330] Michael A. Goodrich,et al. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.
[331] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[332] Zhang-Wei Hong,et al. A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.
[333] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[334] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[335] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[336] J. Zico Kolter,et al. What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.
[337] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[338] Filippos Christianos,et al. Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.
[339] Rahul Savani,et al. Negative Update Intervals in Deep Multi-Agent Reinforcement Learning , 2018, AAMAS.
[340] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[341] Murray L Weidenbaum,et al. Learning to compete , 1986 .
[342] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..
[343] Rémi Munos,et al. Neural Replicator Dynamics , 2019, ArXiv.
[344] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.
[345] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[346] Kenneth O. Stanley,et al. Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.
[347] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[348] Michela Paganini,et al. The Scientific Method in the Science of Machine Learning , 2019, ArXiv.
[349] Jeffrey S. Rosenschein,et al. Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[350] Joshua B. Tenenbaum,et al. Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning , 2017, ArXiv.
[351] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.
[352] Gerald Tesauro,et al. Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.
[353] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[354] Thomas Hofmann,et al. TrueSkill™: A Bayesian Skill Rating System , 2007 .
[355] Kagan Tumer,et al. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.
[356] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.
[357] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.
[358] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[359] Weitang Liu,et al. Surprising Negative Results for Generative Adversarial Tree Search , 2018, 1806.05780.
[360] Michael H. Bowling,et al. Coordination and Adaptation in Impromptu Teams , 2005, AAAI.
[361] Edmund H. Durfee,et al. Rational Coordination in Multi-Agent Environments , 2000, Autonomous Agents and Multi-Agent Systems.
[362] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[363] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[364] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[365] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[366] Milind Tambe,et al. Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..
[367] E. Kalai,et al. Rational Learning Leads to Nash Equilibrium , 1993 .
[368] Todd W. Neller,et al. An Introduction to Counterfactual Regret Minimization , 2013 .
[369] Igor Mordatch,et al. Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents , 2019, ArXiv.
[370] Y. Mansour,et al. Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .
[371] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[372] Peter Stone,et al. Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..
[373] Karl Tuyls,et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent , 2019, IJCAI.
[374] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.
[375] Timothy Patten,et al. Dec-MCTS: Decentralized planning for multi-robot active perception , 2019, Int. J. Robotics Res..
[376] Shimon Whiteson,et al. Approximate solutions for factored Dec-POMDPs with many agents , 2013, AAMAS.
[377] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[378] John Foley,et al. Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments , 2019, ArXiv.