Robot learning from demonstration for path planning: A review

Learning from demonstration (LfD) is an appealing method of helping robots learn new skills. Numerous papers have presented methods of LfD with good performance in robotics. However, complicated robot tasks that need to carefully regulate path planning strategies remain unanswered. Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult, as the interaction between the robot and the environment is time-varying. In this paper, we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning. This classification is based on constraints and obstacle avoidance. Finally, we summarize these methods and present promising directions for robot application and LfD theory.

[1]  Pedro U. Lima,et al.  Inverse reinforcement learning with evaluation , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[2]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[3]  Christos Dimitrakakis,et al.  Preference elicitation and inverse reinforcement learning , 2011, ECML/PKDD.

[4]  Mohamed Medhat Gaber,et al.  Imitation Learning , 2017, ACM Comput. Surv..

[5]  Kristian Kersting,et al.  Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[6]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[7]  Tamim Asfour,et al.  Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[8]  Yoshihiko Nakamura,et al.  Mimesis from partial observations , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[10]  Dana Kulic,et al.  Incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains , 2008, Int. J. Robotics Res..

[11]  Jonathan P. How,et al.  Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.

[12]  Prashant Doshi,et al.  A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..

[13]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Gordon Cheng,et al.  Discovering optimal imitation strategies , 2004, Robotics Auton. Syst..

[15]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[16]  Matthieu Geist,et al.  Structured Classification for Inverse Reinforcement Learning , 2012, EWRL 2012.

[17]  Yoshihiko Nakamura,et al.  Mimetic Communication Model with Compliant Physical Contact in Human—Humanoid Interaction , 2010, Int. J. Robotics Res..

[18]  Katsu Yamane,et al.  Primitive communication based on motion recognition and generation with hierarchical mimesis model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[19]  Matthieu Geist,et al.  Inverse Reinforcement Learning through Structured Classification , 2012, NIPS.

[20]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[21]  Aude Billard,et al.  Active Teaching in Robot Programming by Demonstration , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[22]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[23]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[24]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[26]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[28]  Sachiyo Arai,et al.  Comfortable Driving by using Deep Inverse Reinforcement Learning , 2019, 2019 IEEE International Conference on Agents (ICA).

[29]  Yang Gao,et al.  Special issue on "Bio-inspired computing for autonomous vehicles" , 2012, Int. J. Intell. Comput. Cybern..

[30]  Markus Wulfmeier,et al.  Watch this: Scalable cost-function learning for path planning in urban environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[32]  Raymond H. Putra,et al.  Map Matching with Inverse Reinforcement Learning , 2013, IJCAI.

[33]  Dana Kulic,et al.  Representability of human motions by factorial hidden Markov models , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[35]  Nan Jiang,et al.  Repeated Inverse Reinforcement Learning , 2017, NIPS.

[36]  Stefan Roth,et al.  Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[38]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[39]  Aude Billard,et al.  Discriminative and adaptive imitation in uni-manual and bi-manual tasks , 2006, Robotics Auton. Syst..

[40]  Peter A. Beling,et al.  Inverse reinforcement learning with Gaussian process , 2011, Proceedings of the 2011 American Control Conference.

[41]  Julian Jara-Ettinger,et al.  Theory of mind as inverse reinforcement learning , 2019, Current Opinion in Behavioral Sciences.

[42]  Thomas Brox,et al.  Motion Perception in Reinforcement Learning with Dynamic Objects , 2018, CoRL.

[43]  Dushyant Rao,et al.  Large-scale cost function learning for path planning using deep inverse reinforcement learning , 2017, Int. J. Robotics Res..

[44]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[45]  Kee-Eung Kim,et al.  Hierarchical Bayesian Inverse Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[46]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[47]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[48]  Bo Cheng,et al.  Accelerated Inverse Reinforcement Learning with Randomly Pre-sampled Policies for Autonomous Driving Reward Design , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[49]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[50]  KasabovNikola,et al.  2008 Special issue , 2008 .

[51]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[52]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[53]  Dana Kulic,et al.  Incremental on-line hierarchical clustering of whole body motion patterns , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[54]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[55]  Aude Billard,et al.  What is the Teacher"s Role in Robot Programming by Demonstration? - Toward Benchmarks for Improved Learning , 2007 .

[56]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[57]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[58]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[59]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[60]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[61]  Pierre-Yves Oudeyer,et al.  Incremental local online Gaussian Mixture Regression for imitation learning of multiple tasks , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[62]  Er Meng Joo,et al.  A survey of inverse reinforcement learning techniques , 2012 .

[63]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[64]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[65]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[66]  Yoshihiko Nakamura,et al.  Mimesis Model from Partial Observations for a Humanoid Robot , 2010, Int. J. Robotics Res..

[67]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[68]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[69]  Katsu Yamane,et al.  Primitive communication of humanoid robot with human via hierarchical mimesis model on the proto symbol space , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[70]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[71]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[72]  S. Sitharama Iyengar,et al.  Data-Driven Techniques in Disaster Information Management , 2017, ACM Comput. Surv..

[73]  Stefan Schaal,et al.  Data-Driven Online Decision Making for Autonomous Manipulation , 2015, Robotics: Science and Systems.

[74]  Fuchun Sun,et al.  Survey of imitation learning for robotic manipulation , 2019, International Journal of Intelligent Robotics and Applications.

[75]  Kao-Shing Hwang,et al.  An ensemble method for inverse reinforcement learning , 2020, Inf. Sci..

[76]  Song-Chun Zhu,et al.  Learning Virtual Grasp with Failed Demonstrations via Bayesian Inverse Reinforcement Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[77]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[78]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[79]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[80]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[81]  Vijay Kumar,et al.  Inverse Optimal Planning for Air Traffic Control , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[82]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[83]  Olivier Pietquin,et al.  Inverse reinforcement learning for interactive systems , 2013, MLIS '13.

[84]  Tamer Basar,et al.  Non-Cooperative Inverse Reinforcement Learning , 2019, NeurIPS.

[85]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..