What Can I Do Here? Learning New Skills by Imagining Visual Affordances

A generalist robot equipped with learned skills must be able to perform many tasks in many different environments. However, zero-shot generalization to new settings is not always possible. When the robot encounters a new environment or object, it may need to finetune some of its previously learned skills to accommodate this change. But crucially, previously learned behaviors and models should still be suitable to accelerate this relearning. In this paper, we aim to study how generative models of possible outcomes can allow a robot to learn visual representations of affordances, so that the robot can sample potentially possible outcomes in new situations, and then further train its policy to achieve those outcomes. In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model, attempt to reach them, and thereby update both its skills and its outcome model. We show that this approach can be used to train goal-conditioned policies that operate on raw image inputs, and can rapidly learn to manipulate new objects via our proposed affordance-directed exploration scheme.

[1]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[2]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[3]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[4]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[5]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey , 2020, J. Artif. Intell. Res..

[6]  Justus H. Piater,et al.  Computational models of affordance in robotics: a taxonomy and systematic classification , 2017, Adapt. Behav..

[7]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[8]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[9]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[10]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[11]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[13]  Shuichi Akizuki,et al.  A brief review of affordance in robotic manipulation research , 2017, Adv. Robotics.

[14]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[17]  J. Schulman,et al.  Variational Information Maximizing Exploration , 2016 .

[18]  Pierre-Yves Oudeyer,et al.  Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Doina Precup,et al.  What can I do here? A Theory of Affordances in Reinforcement Learning , 2020, ICML.

[21]  Gabriel Barth-Maron,et al.  Toward Affordance-Aware Planning , 2010 .

[22]  Roberto Mart'in-Mart'in,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[23]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[24]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[25]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[26]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Norris M. Haynes,et al.  An Ecological Approach , 2012 .

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[34]  Doreen Meier,et al.  The Developing Person Through The Life Span , 2016 .

[35]  Jin-Hui Zhu,et al.  Affordance Research in Developmental Robotics: A Survey , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[36]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[37]  Silvio Savarese,et al.  Deep Affordance Foresight: Planning Through What Can Be Done in the Future , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[39]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[40]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[41]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[42]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[43]  Sergey Levine,et al.  Contextual Imagined Goals for Self-Supervised Robotic Learning , 2019, CoRL.

[44]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[45]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[46]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[47]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[48]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[49]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[50]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[51]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[52]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[53]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.