论文信息 - Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.

[1] Greg Turk,et al. Learning Novel Policies For Tasks , 2019, ICML.

[2] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[3] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.

[4] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[5] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[6] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[7] Henryk Michalewski,et al. Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes , 2018, ISC.

[8] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[9] Brendan McCane,et al. VASE: Variational Assorted Surprise Exploration for Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[10] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11] E. Paice,et al. Collaborative learning , 2003, Medical education.

[12] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[13] S. Shankar Sastry,et al. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[14] Hans-Paul Schwefel,et al. Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[15] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.

[16] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[17] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[18] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.

[19] Stéphane Doncieux,et al. Behavioral diversity with multiple behavioral distances , 2013, 2013 IEEE Congress on Evolutionary Computation.

[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..

[22] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[23] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[24] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[25] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[26] Kenneth O. Stanley,et al. Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[28] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[29] Kenneth O. Stanley,et al. Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[31] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.

[32] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[33] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[34] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[35] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[36] Finale Doshi-Velez,et al. Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies , 2019, IJCAI.

[37] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[38] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[39] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[40] Lei Yu,et al. Diverse Exploration via Conjugate Policies for Policy Gradient Methods , 2019, AAAI.

[41] E. Cohen. Restructuring the Classroom: Conditions for Productive Small Groups , 1994 .

[42] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[43] Hao Wu,et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning , 2019, AAAI.

[44] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[47] Zhang-Wei Hong,et al. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning , 2018, NeurIPS.