Learning Latent Plans from Play

We propose learning from teleoperated play data (LfP) as a way to scale up multi-task robotic skill learning. Learning from play (LfP) offers three main advantages: 1) It is cheap. Large amounts of play data can be collected quickly as it does not require scene staging, task segmenting, or resetting to an initial state. 2) It is general. It contains both functional and non-functional behavior, relaxing the need for a predefined task distribution. 3) It is rich. Play involves repeated, varied behavior and naturally leads to high coverage of the possible interaction space. These properties distinguish play from expert demonstrations, which are rich, but expensive, and scripted unattended data collection, which is cheap, but insufficiently rich. Variety in play, however, presents a multimodality challenge to methods seeking to learn control on top. To this end, we introduce Play-LMP, a method designed to handle variability in the LfP setting by organizing it in an embedding space. Play-LMP jointly learns 1) reusable latent plan representations unsupervised from play data and 2) a single goal-conditioned policy capable of decoding inferred plans to achieve user-specified tasks. We show empirically that Play-LMP, despite not being trained on task-specific data, is capable of generalizing to 18 complex user-specified manipulation tasks with average success of 85.5%, outperforming individual models trained on expert demonstrations (success of 70.3%). Furthermore, we find that play-supervised models, unlike their expert-trained counterparts, 1) are more robust to perturbations and 2) exhibit retrying-till-success. Finally, despite never being trained with task labels, we find that our agent learns to organize its latent plan space around functional tasks. Videos of the performed experiments are available at learning-from-play.github.io

[1]  J. Belsky,et al.  From exploration to play: A cross-sectional study of infant free play behavior. , 1981 .

[2]  R. Fagen Animal Play Behavior , 1981 .

[3]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[4]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[5]  Elizabeth Wood,et al.  Play, Learning and the Early Childhood Curriculum , 1996 .

[6]  J. Moran-Ellis The ambiguity of play , 1998 .

[7]  C. Sansone,et al.  Intrinsic and extrinsic motivation : the search for optimal motivation and performance , 2000 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Julia E. Meyers-Manor The genesis of animal play: Testing the limits , 2005 .

[10]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[11]  A. Pellegrini,et al.  Play in evolution and development , 2007 .

[12]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[14]  Yann LeCun,et al.  A multirange architecture for collision‐free off‐road robot navigation , 2009, J. Field Robotics.

[15]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[18]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[19]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[22]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[23]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[24]  Vikash Kumar,et al.  MuJoCo HAPTIX: A virtual reality system for hand manipulation , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[27]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[28]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[29]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[30]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[31]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[33]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[34]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[35]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[36]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[37]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[39]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[42]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[43]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[44]  Misha Denil,et al.  The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously , 2017, CoRL.

[45]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[46]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[48]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[49]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[50]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[51]  Sergio Gomez Colmenarejo,et al.  One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL , 2018, ArXiv.

[52]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[53]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[54]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[55]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[57]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[58]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[59]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[60]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[61]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[62]  Filipe Wall Mutz,et al.  Hindsight policy gradients , 2017, ICLR.

[63]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[64]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[65]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[66]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.