论文信息 - Learning Robot Skills with Temporal Variational Inference

Learning Robot Skills with Temporal Variational Inference

In this paper, we address the discovery of robotic options from demonstrations in an unsupervised manner. Specifically, we present a framework to jointly learn low-level control policies and higher-level policies of how to use them from demonstrations of a robot performing various tasks. By representing options as continuous latent variables, we frame the problem of learning these options as latent variable inference. We then present a temporal formulation of variational inference based on a temporal factorization of trajectory likelihoods,that allows us to infer options in an unsupervised manner. We demonstrate the ability of our framework to learn such options across three robotic demonstration datasets.

Abhinav Gupta | Tanmay Shankar | A. Gupta | Tanmay Shankar

[1] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2] Claude Sammut,et al. Behavioural cloning in control of a dynamic system , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] Gerhard Kramer,et al. Directed information for channels with feedback , 1998 .

[5] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[8] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9] A. Barto,et al. Skill Chaining : Skill Discovery in Continuous Domains , 2009 .

[10] Jan Peters,et al. Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[11] Stefan Schaal,et al. Movement segmentation using a primitive library , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[13] Scott Kuindersma,et al. Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[14] Scott Niekum,et al. Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[16] Anind K. Dey,et al. The Principle of Maximum Causal Entropy for Estimating Interacting Processes , 2013, IEEE Transactions on Information Theory.

[17] Oliver Kroemer,et al. Towards Robot Skill Learning: From Simple Skills to Table Tennis , 2013, ECML/PKDD.

[18] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[19] Jan Peters,et al. Learning modular policies for robotics , 2014, Front. Comput. Neurosci..

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[22] Trevor Darrell,et al. TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[24] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.

[25] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[26] Leslie Pack Kaelbling,et al. Learning composable models of parameterized skills , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[28] Gregory D. Hager,et al. Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[29] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[30] Jan Peters,et al. Learning movement primitive libraries through probabilistic segmentation , 2017, Int. J. Robotics Res..

[31] Ion Stoica,et al. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[32] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[33] Shimon Whiteson,et al. TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[34] Joelle Pineau,et al. An Inference-Based Policy Gradient Method for Learning Options , 2018, ICML.

[35] Silvio Savarese,et al. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36] Silvio Savarese,et al. ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[37] Abhinav Gupta,et al. Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation , 2018, CoRL.

[38] Yoshua Bengio,et al. Variational Temporal Abstraction , 2019, NeurIPS.

[39] Silvio Savarese,et al. Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Karol Gregor,et al. Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[41] Mohit Sharma,et al. Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information , 2018, ICLR.

[42] Pushmeet Kohli,et al. CompILE: Compositional Imitation Learning and Execution , 2018, ICML.

[43] Jan Peters,et al. Learning attribute grammars for movement primitive sequencing , 2019, Int. J. Robotics Res..

[44] Abhinav Gupta,et al. Discovering Motor Programs by Recomposing Demonstrations , 2020, ICLR.