Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

Intrinsically motivated spontaneous exploration is a key enabler of autonomous lifelong learning in human children. It allows them to discover and acquire large repertoires of skills through self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present the unsupervised multi-goal reinforcement learning formal framework as well as an algorithmic approach called intrinsically motivated goal exploration processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP algorithmic architecture relies on several principles: 1) self-generation of goals as parameterized reinforcement learning problems; 2) selection of goals based on intrinsic rewards; 3) exploration with parameterized time-bounded policies and fast incremental goal-parameterized policy search; 4) systematic reuse of information acquired when targeting a goal for improving other goals. We present a particularly efficient form of IMGEP that uses a modular representation of goal spaces as well as intrinsic rewards based on learning progress. We show how IMGEPs automatically generate a learning curriculum within an experimental setup where a real humanoid robot can explore multiple spaces of goals with several hundred continuous dimensions. While no particular target goal is provided to the system beforehand, this curriculum allows the discovery of skills of increasing complexity, that act as stepping stone for learning more complex skills (like nested tool use). We show that learning several spaces of diverse problems can be more efficient for learning complex skills than only trying to directly learn these complex skills. We illustrate the computational efficiency of IMGEPs as these robotic experiments use a simple memory-based low-level policy representations and search algorithm, enabling the whole system to learn online and incrementally on a Raspberry Pi 3.

[1]  D. Berlyne Curiosity and exploration. , 1966, Science.

[2]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[5]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[6]  D. Lewkowicz,et al.  A dynamic systems approach to the development of cognition and action. , 2007, Journal of cognitive neuroscience.

[7]  A. Gopnik,et al.  The scientist in the crib : minds, brains, and how children learn , 1999 .

[8]  Pierre-Yves Oudeyer,et al.  Maximizing Learning Progress: An Internal Reward System for Development , 2003, Embodied Artificial Intelligence.

[9]  Terrence J. Sejnowski,et al.  Exploration Bonuses and Dual Control , 1996, Machine Learning.

[10]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[11]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[12]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[13]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[14]  Pierre-Yves Oudeyer,et al.  Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Jochen J. Steil,et al.  Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[16]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[17]  Kenneth O. Stanley,et al.  Evolving a diversity of virtual creatures through novelty search and local competition , 2011, GECCO '11.

[18]  Faustino J. Gomez,et al.  When Novelty Is Not Enough , 2011, EvoApplications.

[19]  Pierre-Yves Oudeyer,et al.  The strategic student approach for life-long exploration and learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[20]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[21]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[22]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[23]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[24]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[25]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[26]  Pierre-Yves Oudeyer,et al.  Poppy Project: Open-Source Fabrication of 3D Printed Humanoid Robot for Science, Education and Art , 2014 .

[27]  Pierre-Yves Oudeyer,et al.  Self-organization of early vocal development in infants and machines: the role of intrinsic motivation , 2014, Front. Psychol..

[28]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[29]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[30]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[31]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Jochen J. Steil,et al.  Incremental bootstrapping of parameterized motor skills , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[33]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[34]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[35]  Pierre-Yves Oudeyer,et al.  Overlapping waves in tool use development: A curiosity-driven computational model , 2016, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[36]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[37]  Pierre-Yves Oudeyer,et al.  Curiosity-Driven Development of Tool Use Precursors: a Computational Model , 2016, CogSci.

[38]  Pierre-Yves Oudeyer,et al.  Intrinsic motivation, curiosity, and learning: Theory and applications in educational technologies. , 2016, Progress in brain research.

[39]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[40]  Pierre-Yves Oudeyer,et al.  Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Pierre-Yves Oudeyer,et al.  A Unified Model of Speech and Tool Use Early Development , 2017, CogSci.

[43]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[44]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[45]  Ai Poh Loh,et al.  Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[46]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[47]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[48]  Yiannis Demiris,et al.  Quality and Diversity Optimization: A Unifying Modular Framework , 2017, IEEE Transactions on Evolutionary Computation.

[49]  Pierre-Yves Oudeyer,et al.  GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.

[50]  Pierre-Yves Oudeyer,et al.  Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.

[51]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[52]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[53]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[54]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.