Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning

Dexterous manipulation of the robot is an important part of realizing intelligence, but manipulators can only perform simple tasks such as sorting and packing in a structured environment. In view of the existing problem, this paper presents a state-of-the-art survey on an intelligent robot with the capability of autonomous deciding and learning. The paper first reviews the main achievements and research of the robot, which were mainly based on the breakthrough of automatic control and hardware in mechanics. With the evolution of artificial intelligence, many pieces of research have made further progresses in adaptive and robust control. The survey reveals that the latest research in deep learning and reinforcement learning has paved the way for highly complex tasks to be performed by robots. Furthermore, deep reinforcement learning, imitation learning, and transfer learning in robot control are discussed in detail. Finally, major achievements based on these methods are summarized and analyzed thoroughly, and future research challenges are proposed.

[1]  Eric Horvitz,et al.  Blind Spot Detection for Safe Sim-to-Real Transfer , 2020, J. Artif. Intell. Res..

[2]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[3]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[4]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[5]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[6]  Hai Nguyen,et al.  Review of Deep Reinforcement Learning for Robot Manipulation , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).

[7]  Hong Liu,et al.  On computing three-finger force-closure grasps of 2-D and 3-D objects , 2003, IEEE Trans. Robotics Autom..

[8]  Gaurav S. Sukhatme,et al.  Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations , 2018, ArXiv.

[9]  Leslie Pack Kaelbling,et al.  Unifying perception, estimation and action for mobile manipulation via belief space planning , 2012, 2012 IEEE International Conference on Robotics and Automation.

[10]  E. Torres-Jara,et al.  Challenges for Robot Manipulation in Human Environments , 2006 .

[11]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[12]  Kazuhiro Kosuge,et al.  Dance Step Estimation Method Based on HMM for Dance Partner Robot , 2007, IEEE Transactions on Industrial Electronics.

[13]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[14]  Pierre-Yves Oudeyer,et al.  Sim-to-Real Transfer with Neural-Augmented Robot Simulation , 2018, CoRL.

[15]  Paul Evrard,et al.  Learning collaborative manipulation tasks by demonstration using a haptic interface , 2009, ICAR.

[16]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17]  Zhaojie Ju,et al.  Multimodal Human Hand Motion Sensing and Analysis—A Review , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[18]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[19]  Julie A. Shah,et al.  Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Chun-Yi Su,et al.  Human-Inspired Control of Dual-Arm Exoskeleton Robots With Force and Impedance Adaptation , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[21]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[22]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[23]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[24]  Toshio Fukuda,et al.  Reinforcement Learning of Manipulation and Grasping Using Dynamical Movement Primitives for a Humanoidlike Mobile Manipulator , 2017, IEEE/ASME Transactions on Mechatronics.

[25]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017 .

[26]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[27]  Charles C. Kemp,et al.  Challenges for robot manipulation in human environments [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[28]  Bin He,et al.  Noninvasive Electroencephalogram Based Control of a Robotic Arm for Reach and Grasp Tasks , 2016, Scientific Reports.

[29]  Shie Mannor,et al.  End-to-End Differentiable Adversarial Imitation Learning , 2017, ICML.

[30]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[31]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[32]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[33]  Matthieu Geist,et al.  Inverse Reinforcement Learning through Structured Classification , 2012, NIPS.

[34]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[35]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[36]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[37]  Danica Kragic,et al.  Trends and challenges in robot manipulation , 2019, Science.

[38]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[39]  Jennie Si,et al.  Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis , 2020, IEEE Transactions on Cybernetics.

[40]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[41]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[42]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[43]  Andrej Gams,et al.  Coupling Movement Primitives: Interaction With the Environment and Bimanual Tasks , 2014, IEEE Transactions on Robotics.

[44]  Rüdiger Dillmann,et al.  Autonomous grasp and manipulation planning using a ToF camera , 2012, Robotics Auton. Syst..

[45]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[46]  Peter K. Allen,et al.  Generating multi-fingered robotic grasps via deep learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[48]  Darwin G. Caldwell,et al.  Human-robot skills transfer interfaces for a flexible surgical robot , 2014, Comput. Methods Programs Biomed..

[49]  Mehmet Remzi Dogar,et al.  Haptic identification of objects using a modular soft robotic gripper , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50]  Minija Tamosiunaite,et al.  Interaction learning for dynamic movement primitives used in cooperative robotic tasks , 2013, Robotics Auton. Syst..

[51]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Abderrahmane Kheddar,et al.  Motion learning and adaptive impedance for robot control during physical interaction with humans , 2011, 2011 IEEE International Conference on Robotics and Automation.

[53]  Yu Sun,et al.  Grasp planning to maximize task coverage , 2015, Int. J. Robotics Res..

[54]  Maya Cakmak,et al.  Robot Programming by Demonstration with Interactive Action Visualizations , 2014, Robotics: Science and Systems.

[55]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[56]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[57]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[58]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[59]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[60]  Fuchun Sun,et al.  Survey of imitation learning for robotic manipulation , 2019, International Journal of Intelligent Robotics and Applications.

[61]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[62]  Oliver Kroemer,et al.  A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[63]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[64]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[65]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[66]  Chenguang Yang,et al.  Biologically Inspired Motion Modeling and Neural Control for Robot Learning From Demonstrations , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[67]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[68]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[69]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[70]  Yi Li,et al.  Grasp type revisited: A modern perspective on a classical feature for vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Brijen Thananjeyan,et al.  SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards , 2018, Int. J. Robotics Res..

[72]  Peter Stone,et al.  Generalized model learning for Reinforcement Learning on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[73]  Danica Kragic,et al.  The GRASP Taxonomy of Human Grasp Types , 2016, IEEE Transactions on Human-Machine Systems.

[74]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[75]  S. Gass,et al.  Encyclopedia of Operations Research and Management Science , 1997 .

[76]  Leslie Pack Kaelbling,et al.  Efficient Planning in Non-Gaussian Belief Spaces and Its Application to Robot Grasping , 2011, ISRR.

[77]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[78]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[79]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[80]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[81]  Xiaoli Chu,et al.  A Learning-Based Hierarchical Control Scheme for an Exoskeleton Robot in Human–Robot Cooperative Manipulation , 2020, IEEE Transactions on Cybernetics.

[82]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[83]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[84]  Paul Evrard,et al.  Teaching physical collaborative tasks: object-lifting case study with a humanoid , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[85]  Jackie Kay,et al.  Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer , 2019, ArXiv.

[86]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[87]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[88]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[89]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[90]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[91]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[92]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[93]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[94]  Gong Zhang,et al.  A review: machine learning on robotic grasping , 2019, International Conference on Machine Vision.

[95]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[96]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[97]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[98]  Dario Farina,et al.  Simultaneous control of multiple functions of bionic hand prostheses: Performance and robustness in end users , 2018, Science Robotics.

[99]  Sergey Levine,et al.  Learning Robotic Manipulation of Granular Media , 2017, CoRL.

[100]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[101]  Carme Torras,et al.  External force estimation during compliant robot manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[102]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[103]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[104]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[105]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[106]  Huosheng Hu,et al.  Robot Learning from Demonstration in Robotic Assembly: A Survey , 2018, Robotics.

[107]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[108]  Jean-Pierre Thibaut,et al.  Developing motor planning over ages. , 2010, Journal of experimental child psychology.

[109]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[110]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[111]  Jing Xu,et al.  Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications , 2021, IEEE Transactions on Industrial Electronics.

[112]  Carme Torras,et al.  Learning Physical Collaborative Robot Behaviors From Human Demonstrations , 2016, IEEE Transactions on Robotics.

[113]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[114]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[115]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[116]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[117]  Abdelkader El Kamel,et al.  Neural inverse reinforcement learning in autonomous navigation , 2016, Robotics Auton. Syst..

[118]  Stefano Ermon,et al.  Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[119]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[120]  Mykel J. Kochenderfer,et al.  Imitating driver behavior with generative adversarial networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[121]  Antonio Bicchi,et al.  On the Closure Properties of Robotic Grasping , 1995, Int. J. Robotics Res..

[122]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[123]  Wolfram Burgard,et al.  Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[124]  Yongxin Chen,et al.  On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator , 2019, ArXiv.

[125]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[126]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[127]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[128]  Danica Kragic,et al.  Variational Auto-Regularized Alignment for Sim-to-Real Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[129]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[130]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[131]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[132]  Yanan Li,et al.  Multi-hierarchy interaction control of a redundant robot using impedance learning , 2020, Mechatronics.

[133]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[134]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[135]  Xialing Lin,et al.  Evaluations of an artificial intelligence instructor's voice: Social Identity Theory in human-robot interactions , 2019, Comput. Hum. Behav..

[136]  Blake Hannaford,et al.  Semi-autonomous simulated brain tumor ablation with RAVENII Surgical Robot using behavior tree , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[137]  Kris M. Kitani,et al.  How do we use our hands? Discovering a diverse set of common grasps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[138]  Yong Liu,et al.  Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient , 2020, Frontiers in Neurorobotics.

[139]  Kao-Shing Hwang,et al.  End-to-End Navigation Strategy With Deep Reinforcement Learning for Mobile Robots , 2020, IEEE Transactions on Industrial Informatics.

[140]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[141]  Sergey Levine,et al.  Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[142]  Zhijun Li,et al.  Cooperative Manipulation for a Mobile Dual-Arm Robot Using Sequences of Dynamic Movement Primitives , 2020, IEEE Transactions on Cognitive and Developmental Systems.

[143]  Masashi Sugiyama,et al.  Imitation Learning from Imperfect Demonstration , 2019, ICML.

[144]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[145]  Jinguo Liu,et al.  Physical Human–Robot Collaboration: Robotic Systems, Learning Methods, Collaborative Strategies, Sensors, and Actuators , 2019, IEEE Transactions on Cybernetics.

[146]  Chee-Kong Chui,et al.  Robot-Assisted Training in Laparoscopy Using Deep Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.