Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations.

[1]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[2]  Jochen J. Steil,et al.  Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[3]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[4]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Pierre-Yves Oudeyer,et al.  Socially guided intrinsic motivation for robot learning of motor skills , 2014, Auton. Robots.

[7]  A. Cangelosi,et al.  Developmental Robotics: From Babies to Robots , 2015 .

[8]  Misha Denil,et al.  The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously , 2017, CoRL.

[9]  Friedrich T. Sommer,et al.  Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[10]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[11]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[12]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[13]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[15]  Bonny Banerjee,et al.  A predictive coding framework for a developmental agent: Speech motor skill acquisition and speech production , 2017, Speech Commun..

[16]  Pierre-Yves Oudeyer,et al.  Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Pierre-Yves Oudeyer,et al.  Motivational principles for visual know-how development , 2003 .

[18]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[19]  Charles Blundell,et al.  Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[20]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Karl J. Friston,et al.  Active Inference, Curiosity and Insight , 2017, Neural Computation.

[23]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Anna Chadwick The Scientist in the Crib -- Minds, Brains, and How Children Learn , 2001 .

[26]  D. Berlyne Curiosity and exploration. , 1966, Science.

[27]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[28]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[29]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[30]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[31]  Pierre-Yves Oudeyer,et al.  How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[32]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[34]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[35]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[36]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[37]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[38]  Ralf Der,et al.  Information Driven Self-Organization of Complex Robotic Behaviors , 2013, PloS one.

[39]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[40]  Kenneth O. Stanley,et al.  Why Greatness Cannot Be Planned , 2015, Springer International Publishing.

[41]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[42]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[43]  C. Hofsten An action perspective on motor development , 2004, Trends in Cognitive Sciences.

[44]  Pierre-Yves Oudeyer,et al.  Intrinsic motivation, curiosity, and learning: Theory and applications in educational technologies. , 2016, Progress in brain research.

[45]  Peter Andreae,et al.  A teachable machine in the real world , 1978 .

[46]  Christoph Salge,et al.  Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.

[47]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[48]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[49]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[50]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[51]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[52]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[53]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .