论文信息 - Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations.

[1] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[2] Jochen J. Steil,et al. Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[3] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[4] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[5] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6] Pierre-Yves Oudeyer,et al. Socially guided intrinsic motivation for robot learning of motor skills , 2014, Auton. Robots.

[7] A. Cangelosi,et al. Developmental Robotics: From Babies to Robots , 2015 .

[8] Misha Denil,et al. The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously , 2017, CoRL.

[9] Friedrich T. Sommer,et al. Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[10] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[11] Ole Winther,et al. Ladder Variational Autoencoders , 2016, NIPS.

[12] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[15] Bonny Banerjee,et al. A predictive coding framework for a developmental agent: Speech motor skill acquisition and speech production , 2017, Speech Commun..

[16] Pierre-Yves Oudeyer,et al. Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17] Pierre-Yves Oudeyer,et al. Motivational principles for visual know-how development , 2003 .

[18] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[19] Charles Blundell,et al. Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[20] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] Karl J. Friston,et al. Active Inference, Curiosity and Insight , 2017, Neural Computation.

[23] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[24] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[25] Anna Chadwick. The Scientist in the Crib -- Minds, Brains, and How Children Learn , 2001 .

[26] D. Berlyne. Curiosity and exploration. , 1966, Science.

[27] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[28] Marco Mirolli,et al. Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[29] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[30] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[31] Pierre-Yves Oudeyer,et al. How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[32] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33] Philippe Beaudoin,et al. Independently Controllable Factors , 2017, ArXiv.

[34] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[35] Max Welling,et al. Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[36] Pierre-Yves Oudeyer,et al. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[37] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[38] Ralf Der,et al. Information Driven Self-Organization of Complex Robotic Behaviors , 2013, PloS one.

[39] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[40] Kenneth O. Stanley,et al. Why Greatness Cannot Be Planned , 2015, Springer International Publishing.

[41] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[42] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[43] C. Hofsten. An action perspective on motor development , 2004, Trends in Cognitive Sciences.

[44] Pierre-Yves Oudeyer,et al. Intrinsic motivation, curiosity, and learning: Theory and applications in educational technologies. , 2016, Progress in brain research.

[45] Peter Andreae,et al. A teachable machine in the real world , 1978 .

[46] Christoph Salge,et al. Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.

[47] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[48] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[49] M. Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[50] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[51] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[52] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[53] Stefan Schaal,et al. http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .