论文信息 - Learning to Play by Imitating Humans

Learning to Play by Imitating Humans

Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions. Recently it has been shown that it is possible to acquire a diverse set of skills by self-supervising control on top of human teleoperated play data. Play is rich in state space coverage and a policy trained on this data can generalize to specific tasks at test time outperforming policies trained on individual expert task demonstrations. In this work, we explore the question of whether robots can learn to play to autonomously generate play data that can ultimately enhance performance. By training a behavioral cloning policy on a relatively small quantity of human play, we autonomously generate a large quantity of cloned play data that can be used as additional training. We demonstrate that a general purpose goal-conditioned policy trained on this augmented dataset substantially outperforms one trained only with the original human data on 18 difficult user-specified manipulation tasks in a simulated robotic tabletop environment. A video example of a robot imitating human play can be seen here: this https URL

[1] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[2] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[3] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.

[4] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5] Tomás Lozano-Pérez,et al. Imitation Learning of Whole-Body Grasps , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[8] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[9] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.

[10] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[11] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.

[12] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[13] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.

[14] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[15] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[16] Helge J. Ritter,et al. Situated robot learning for multi-modal instruction and imitation of grasping , 2004, Robotics Auton. Syst..

[17] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[18] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[19] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[20] Oliver Kroemer,et al. A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[21] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[23] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[24] Guillermo Garcia-Hernando,et al. Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation , 2018, ECCV Workshops.

[25] Siddhartha S. Srinivasa,et al. Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[26] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.

[27] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[28] Rouhollah Rahmatizadeh,et al. Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[34] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[35] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[36] Monica N. Nicolescu,et al. Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[37] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[38] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[39] Pieter Abbeel,et al. Goal-conditioned Imitation Learning , 2019, NeurIPS.

[40] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[41] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[42] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[43] Jitendra Malik,et al. Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44] Stefan Schaal,et al. Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[45] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.