论文信息 - Diversity is All You Need: Learning Skills without a Reward Function - 字舞流文

Diversity is All You Need: Learning Skills without a Reward Function

Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. In these environments, some of the learned skills correspond to solving the task, and each skill that solves the task does so in a distinct manner. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning

Sergey Levine | Abhishek Gupta | Julian Ibarz | Benjamin Eysenbach | S. Levine | Abhishek Gupta | Benjamin Eysenbach | Julian Ibarz

[1] R. Merton. The Matthew effect in science. The reward and communication systems of science are considered. , 1968, Science.

[2] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[3] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[4] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[5] David Barber,et al. The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[6] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[7] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[8] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[9] Stéphane Doncieux,et al. Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity , 2009, 2009 IEEE Congress on Evolutionary Computation.

[10] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[11] Kenneth O. Stanley,et al. Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[12] Kenneth O. Stanley,et al. Evolving a diversity of virtual creatures through novelty search and local competition , 2011, GECCO '11.

[13] Kenneth O. Stanley,et al. On the deleterious effects of a priori objectives on evolution and representation , 2011, GECCO '11.

[14] Peter Stone,et al. Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[15] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[16] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[18] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[19] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[20] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[21] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[22] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[24] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[25] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[26] Kenneth O. Stanley,et al. Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[27] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[28] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[30] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[31] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.

[32] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[33] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[35] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[36] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[37] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[38] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[39] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[40] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[41] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[42] Ion Stoica,et al. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[43] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[44] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[45] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[46] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[47] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[48] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.