Learning Robot Skills with Temporal Variational Inference

In this paper, we address the discovery of robotic options from demonstrations in an unsupervised manner. Specifically, we present a framework to jointly learn low-level control policies and higher-level policies of how to use them from demonstrations of a robot performing various tasks. By representing options as continuous latent variables, we frame the problem of learning these options as latent variable inference. We then present a temporal formulation of variational inference based on a temporal factorization of trajectory likelihoods,that allows us to infer options in an unsupervised manner. We demonstrate the ability of our framework to learn such options across three robotic demonstration datasets.

[1]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2]  Claude Sammut,et al.  Behavioural cloning in control of a dynamic system , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  A. Barto,et al.  Skill Chaining : Skill Discovery in Continuous Domains , 2009 .

[10]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[11]  Stefan Schaal,et al.  Movement segmentation using a primitive library , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[13]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[14]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[16]  Anind K. Dey,et al.  The Principle of Maximum Causal Entropy for Estimating Interacting Processes , 2013, IEEE Transactions on Information Theory.

[17]  Oliver Kroemer,et al.  Towards Robot Skill Learning: From Simple Skills to Table Tennis , 2013, ECML/PKDD.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Jan Peters,et al.  Learning modular policies for robotics , 2014, Front. Comput. Neurosci..

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Trevor Darrell,et al.  TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[24]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[25]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[26]  Leslie Pack Kaelbling,et al.  Learning composable models of parameterized skills , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[28]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[29]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[30]  Jan Peters,et al.  Learning movement primitive libraries through probabilistic segmentation , 2017, Int. J. Robotics Res..

[31]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[32]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[33]  Shimon Whiteson,et al.  TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[34]  Joelle Pineau,et al.  An Inference-Based Policy Gradient Method for Learning Options , 2018, ICML.

[35]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Silvio Savarese,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[37]  Abhinav Gupta,et al.  Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation , 2018, CoRL.

[38]  Yoshua Bengio,et al.  Variational Temporal Abstraction , 2019, NeurIPS.

[39]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[41]  Mohit Sharma,et al.  Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information , 2018, ICLR.

[42]  Pushmeet Kohli,et al.  CompILE: Compositional Imitation Learning and Execution , 2018, ICML.

[43]  Jan Peters,et al.  Learning attribute grammars for movement primitive sequencing , 2019, Int. J. Robotics Res..

[44]  Abhinav Gupta,et al.  Discovering Motor Programs by Recomposing Demonstrations , 2020, ICLR.