论文信息 - Multi-Agent Advisor Q-Learning

Multi-Agent Advisor Q-Learning

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question which arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novelQ-learning based algorithms: ADMIRAL Decision Making (ADMIRAL-DM) and ADMIRAL Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed-point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.

Kate Larson | Mark Crowley | Sriram Ganapathi Subramanian | Matthew E. Taylor

[1] Stefano Ermon,et al. Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[3] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[4] Junwei Lu,et al. Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation , 2020, NeurIPS.

[5] Victor Talpaert,et al. Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7] Sonia Chernova,et al. Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.

[8] Julian Togelius,et al. A hybrid search agent in pommerman , 2018, FDG.

[9] Robert E. Schapire,et al. A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[10] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[11] Prashant Doshi,et al. Multi-robot inverse reinforcement learning under occlusion with interactions , 2014, AAMAS.

[12] Sime Curkovic,et al. Sustainable Development : Authoritative and Leading Edge Content for Environmental Management , 2012 .

[13] Sarit Kraus,et al. Making friends on the fly: Cooperating with new teammates , 2017, Artif. Intell..

[14] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[15] Eder Santana,et al. Learning a Driving Simulator , 2016, ArXiv.

[16] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[17] Ofir Marom,et al. Belief Reward Shaping in Reinforcement Learning , 2018, AAAI.

[18] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[19] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[20] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[21] Miguel G. Cruz,et al. Assessing crown fire potential in coniferous forests of western North America: a critique of current approaches and recent simulation studies. , 2010 .

[22] Rob Fergus,et al. Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.

[23] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[25] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[26] Nir Levine,et al. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis , 2021, Machine Learning.

[27] Stephen C. Adams,et al. Multi-agent Inverse Reinforcement Learning for Certain General-sum Stochastic Games , 2019, J. Artif. Intell. Res..

[28] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[29] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30] Diogo Carvalho,et al. A new convergent variant of Q-learning with linear function approximation , 2020, NeurIPS.

[31] A. M. Fink,et al. Equilibrium in a stochastic $n$-person game , 1964 .

[32] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[33] Eder Santana,et al. Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Yuan Zhou,et al. Learning Guidance Rewards with Trajectory-space Smoothing , 2020, NeurIPS.

[35] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[36] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[37] Zhang-Wei Hong,et al. A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.

[38] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[39] Crystal S. Stonesifer,et al. Wildfire Response Performance Measurement: Current and Future Directions , 2018, Fire.

[40] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[41] Canan Eryiğit. Marketing Models: A Review of the Literature , 2017 .

[42] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[43] Siyuan Liu,et al. Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise , 2014, AAAI.

[44] H.H.T. Liu,et al. A cooperative UAV/UGV platform for wildfire detection and fighting , 2008, 2008 Asia Simulation Conference - 7th International Conference on System Simulation and Scientific Computing.

[45] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[46] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[47] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[48] Junliang Xing,et al. Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations , 2020, IJCAI.

[49] Alexey Dosovitskiy,et al. End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[51] D. Roberts,et al. Evaluating the Ability of FARSITE to Simulate Wildfires Influenced by Extreme, Downslope Winds in Santa Barbara, California , 2020, Fire.

[52] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[53] Felipe Leno da Silva,et al. Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[54] R. Rothermel. A Mathematical Model for Predicting Fire Spread in Wildland Fuels , 2017 .

[55] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[56] Mykel J. Kochenderfer,et al. Imitating driver behavior with generative adversarial networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[57] Stefan Schaal,et al. Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[58] Marek Petrik,et al. Robust Maximum Entropy Behavior Cloning , 2021, ArXiv.

[59] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[60] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[61] Yu Wei,et al. Risk Management and Analytics in Wildfire Response , 2019, Current Forestry Reports.

[62] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[63] Peter A. Beling,et al. Multi-agent Inverse Reinforcement Learning for Zero-sum Games , 2014, ArXiv.

[64] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[65] S. G. Ponnambalam,et al. Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task , 2012 .

[66] Kevin Waugh,et al. Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[67] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[68] Philip S. Yu,et al. Differential Advising in Multi-Agent Reinforcement Learning , 2020, ArXiv.

[69] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[70] Markus Wulfmeier,et al. Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[71] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[72] Yisong Yue,et al. Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[73] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[74] J. San-Miguel-Ayanz,et al. Use of Remote Sensing in Wildfire Management , 2012 .

[75] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[76] Julian Togelius,et al. Pommerman: A Multi-Agent Playground , 2018, AIIDE Workshops.

[77] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[78] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[79] M. Finney. FARSITE : Fire Area Simulator : model development and evaluation , 1998 .

[80] Matthew E. Taylor,et al. A conceptual framework for externally-influenced agents: an assisted reinforcement learning review , 2020, Journal of Ambient Intelligence and Humanized Computing.

[81] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[82] Ananth Hari,et al. PettingZoo: Gym for Multi-Agent Reinforcement Learning , 2020, 2009.14471.

[83] V Nikitin,et al. Development of a robotic vehicle complex for wildfire-fighting by means of fire-protection roll screens , 2019 .

[84] Vinicius G. Goecks,et al. Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments , 2020, AAMAS.

[85] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[86] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[87] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[88] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[89] Xingyu Wang,et al. Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations , 2018, ICML.

[90] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[91] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[92] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[93] Jonathan P. How,et al. Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[94] Sham M. Kakade,et al. Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[95] Matthew E. Taylor,et al. Improving Reinforcement Learning with Confidence-Based Demonstrations , 2017, IJCAI.

[96] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[97] Radha Poovendran,et al. Shaping Advice in Deep Multi-Agent Reinforcement Learning , 2021, ArXiv.

[98] Gergely V. Záruba,et al. Inverse reinforcement learning for decentralized non-cooperative multiagent systems , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[99] Chao Gao,et al. On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman , 2019, AIIDE.

[100] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[102] Harshad Khadilkar,et al. Accelerating Training in Pommerman with Imitation and Reinforcement Learning , 2019, ArXiv.

[103] G. DeJong,et al. Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[104] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[105] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[106] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[107] Mark Crowley,et al. A review of machine learning applications in wildfire science and management , 2020, Environmental Reviews.

[108] Mao Li,et al. Two-level Q-learning: learning from conflict demonstrations , 2019, The Knowledge Engineering Review.

[109] Fermín J. Alcasena,et al. Evaluating fire modelling systems in recent wildfires of the Golestan National Park, Iran , 2016 .

[110] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.

[111] Lantao Yu,et al. Multi-Agent Adversarial Inverse Reinforcement Learning , 2019, ICML.

[112] Xinli Cai,et al. Wildfire management in Canada: Review, challenges and opportunities , 2020, Progress in Disaster Science.

[113] Manuela Veloso,et al. Reinforcement Learning for Market Making in a Multi-agent Dealer Market , 2019, ArXiv.

[114] Felipe Leno da Silva,et al. A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[115] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[116] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[117] Gang Pan,et al. Knowledge-Guided Agent-Tactic-Aware Learning for StarCraft Micromanagement , 2018, IJCAI.

[118] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[119] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.

[120] Alessandro Lazaric,et al. Direct Policy Iteration with Demonstrations , 2015, IJCAI.

[121] Wenbing Huang,et al. Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance , 2019, AAAI.

[122] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[123] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[124] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[125] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[126] Sam Devlin,et al. An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..