Multi-Task Hierarchical Imitation Learning for Home Automation

Control policies for home automation robots can be learned from human demonstrations, and hierarchical control has the potential to reduce the required number of demonstrations. When learning multiple policies for related tasks, demonstrations can be reused between the tasks to further reduce the number of demonstrations needed to learn each new policy. We present HIL-MT, a framework for Multi-Task Hierarchical Imitation Learning, involving a human teacher, a networked Toyota HSR robot, and a cloud-based server that stores demonstrations and trains models. In our experiments, HIL-MT learns a policy for clearing a table of dishes from 11.2 demonstrations on average. Learning to set the table requires 19 new demonstrations when training separately, but only 11.6 new demonstrations when also reusing demonstrations of clearing the table. HIL-MT learns policies for building 3- and 4-level pyramids of glass cups from 8.2 and 5 demonstrations, respectively, but reusing the 3-level demonstrations for learning a 4-level policy only requires 2.7 new demonstrations. These results suggest that learning hierarchical policies for structured domestic tasks can reuse existing demonstrations of related tasks to reduce the need for new demonstrations.

[1]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[2]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[3]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[4]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[5]  Dawn Xiaodong Song,et al.  Parametrized Hierarchical Procedures for Neural Programming , 2018, ICLR.

[6]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[7]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[8]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[10]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[11]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[12]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[13]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[15]  Christos Dimitrakakis,et al.  Bayesian Multitask Inverse Reinforcement Learning , 2011, EWRL.

[16]  Hussein A. Abbass,et al.  Multi-Task Deep Reinforcement Learning for Continuous Action Control , 2017, IJCAI.

[17]  Khanh Nguyen,et al.  Imitation Learning with Recurrent Neural Networks , 2016, ArXiv.

[18]  Mohamed Medhat Gaber,et al.  Imitation Learning , 2017, ACM Comput. Surv..

[19]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[20]  Rajesh P. N. Rao,et al.  Imitation learning with hierarchical actions , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[21]  Jan Peters,et al.  Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[22]  Marc Brockschmidt,et al.  Neural Program Lattices , 2016, ICLR.

[23]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[24]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[26]  Abhishek Sarkar,et al.  DiGrad: Multi-Task Reinforcement Learning with Shared Actions , 2018, ArXiv.

[27]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[28]  John Shawe-Taylor,et al.  Learning Shared Representations in Multi-task Reinforcement Learning , 2016, ArXiv.

[29]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[30]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[31]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[32]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[33]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[34]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[35]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[36]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[37]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[38]  Junhong Xu,et al.  Shared Multi-Task Imitation Learning for Indoor Self-Navigation , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[39]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[40]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[42]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[43]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[44]  John Kubiatowicz,et al.  A Fog Robotics Approach to Deep Robot Learning: Application to Object Recognition and Grasp Planning in Surface Decluttering , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[45]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[47]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[48]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[49]  Swarat Chaudhuri,et al.  Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[50]  Dawn Song,et al.  Hierarchical Imitation Learning via Variational Inference of Control Programs , 2018 .

[51]  Pieter Abbeel,et al.  Image Object Label 3 D CAD Model Candidate Grasps Google Object Recognition Engine Google Cloud Storage Select Feasible Grasp with Highest Success Probability Pose EstimationCamera Robots Cloud 3 D Sensor , 2014 .