论文信息 - CIExplore: Curiosity and Influence-based Exploration in Multi-Agent Cooperative Scenarios with Sparse Rewards

CIExplore: Curiosity and Influence-based Exploration in Multi-Agent Cooperative Scenarios with Sparse Rewards

Learning in a sparse-reward setting is a well-known challenge in RL (Reinforcement Learning). In the single-agent domain, this challenge can be addressed by introducing exploration bonuses driven by intrinsic motivation to encourage agents to visit unseen states. However, naively applying these methods in MARL (Multi-Agent Reinforcement Learning) cooperative settings with sparse rewards results in some inevitable problems: misunderstanding environmental knowledge and lack of collaboration among agents, etc. Based on this, in this paper, we propose the Curiosity and Influence-based Explore (CIExplore) method, which includes a new form of intrinsic reward and an internal counterfactual advantage function. Concretely, the intrinsic reward is a combination of joint curiosity reward and influence reward. The former is the variance of outputs across an ensemble of prediction models that take joint observations and actions of all agents as inputs to predict the next time's joint observations. And the latter quantifies the influence of one agent's behavior on other agents' state-value functions. Given that the joint curiosity reward is shared by all agents, we compute an internal counterfactual advantage function to address this intrinsic reward assignment problem. We demonstrate the efficacy of CIExplore in the multi-agent grid-world environments and show that it is compatible with both on-policy and off-policy MARL algorithms and be scalable to complex settings where agents' number or environment randomness increases.

[1] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[2] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[3] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.

[4] Nando de Freitas,et al. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[5] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[6] Pierre Fournier,et al. Intrinsically Motivated and Interactive Reinforcement Learning: a Developmental Approach. (Apprentissage par Renforcement Intrinsèquement Motivé et Interactif: une approche développementale) , 2019 .

[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[8] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[9] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[12] Shimon Whiteson,et al. Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, ArXiv.

[13] Marcus Hutter,et al. Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.

[14] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[15] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[16] Kan Li,et al. Social Influence Analysis: Models, Methods, and Evaluation , 2018 .

[17] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[18] P. Silvia. Curiosity and Motivation , 2012, The Oxford Handbook of Human Motivation.

[19] Fei Sha,et al. Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning , 2019, ArXiv.

[20] Jie Tang,et al. Survey of social influence analysis and modeling , 2017 .

[21] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[22] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[23] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[24] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[25] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.

[28] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[29] Lucas Beyer,et al. MULEX: Disentangling Exploitation from Exploration in Deep RL , 2019, ArXiv.

[30] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[31] Yi Wu,et al. Influence-Based Multi-Agent Exploration , 2020, ICLR.

[32] R. Ryan. The Oxford Handbook of Human Motivation , 2012 .

[33] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[34] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[35] Jinwoo Shin,et al. State Entropy Maximization with Random Encoders for Efficient Exploration , 2021, ICML.

[36] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[37] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[38] Kagan Tumer,et al. Distributed agent-based air traffic flow management , 2007, AAMAS '07.

[39] Noah J. Goldstein,et al. Social influence: compliance and conformity. , 2004, Annual review of psychology.

[40] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[41] Marlos C. Machado,et al. Count-Based Exploration with the Successor Representation , 2018, AAAI.