论文信息 - Imitation Learning

Imitation Learning

Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.

[1] 共立出版株式会社. コンピュータ・サイエンス : ACM computing surveys , 1978 .

[2] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[3] Claude Sammut,et al. Learning to Fly , 1992, ML.

[4] Gavriel Salomon,et al. T RANSFER OF LEARNING , 1992 .

[5] Dana H. Ballard,et al. Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[6] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[7] Peter Bakker,et al. Robot see, robot do: An overview of robot imitation , 1996 .

[8] Frédéric Gruau,et al. Cellular Encoding for interactive evolutionary robotics , 1996 .

[9] Dean A. Pomerleau,et al. Neural Network Vision for Robot Driving , 1997 .

[10] Phil Husbands,et al. Evolutionary robotics , 2014, Evolutionary Intelligence.

[11] Ran,et al. The correspondence problem , 1998 .

[12] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[13] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[14] Craig Boutilier,et al. Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[15] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[16] Maja J. Mataric,et al. Getting Humanoids to Move and Imitate , 2000, IEEE Intell. Syst..

[17] D. Floreano,et al. Evolutionary Robotics: The Biology,Intelligence,and Technology , 2000 .

[18] Aude Billard,et al. Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[19] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[20] Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21] Benjamin Geisler,et al. An Empirical Study of Machine Learning Algorithms Applied to Modeling Player Behavior in a "First Person Shooter" Video Game , 2002 .

[22] Maja J. Matarić,et al. Sensory-motor primitives as a basis for imitation: linking perception to action and biology to robotics , 2002 .

[23] Michael A. Arbib,et al. Schema design and implementation of the grasp-related mirror neuron system , 2002, Biological Cybernetics.

[24] K. Dautenhahn,et al. The correspondence problem , 2002 .

[25] Stefan Schaal,et al. Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[26] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[27] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[28] Monica N. Nicolescu,et al. Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[29] Gordon Cheng,et al. Learning tasks from observation and practice , 2004, Robotics Auton. Syst..

[30] Pradeep K. Khosla,et al. Learning by observation with mobile robots: a computational approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[31] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[32] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[33] Jun Morimoto,et al. Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[34] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[35] Christian Bauckhage,et al. Imitation learning at all levels of game-AI , 2004 .

[36] Christian Bauckhage,et al. Learning Human-Like Movement Behavior for Computer Games , 2004 .

[37] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[38] Ales Ude,et al. Programming full-body movements for humanoid robots by observation , 2004, Robotics Auton. Syst..

[39] Y. Demiris,et al. From motor babbling to hierarchical learning by imitation: a robot developmental pathway , 2005 .

[40] José María Valls,et al. Correcting and improving imitation models of humans for Robosoccer agents , 2005, 2005 IEEE Congress on Evolutionary Computation.

[41] D. Feil-Seifer,et al. Defining socially assistive robotics , 2005, 9th International Conference on Rehabilitation Robotics, 2005. ICORR 2005..

[42] Jin-Hui Zhu,et al. Obstacle avoidance with multi-objective optimization by PSO in dynamic environment , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[43] Matthew W. Hoffman,et al. Probabilistic Gaze Imitation and Saliency Learning in a Robotic Head , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[44] Jude W. Shavlik,et al. Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[45] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[46] Jürgen Schmidhuber,et al. A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[47] Chrystopher L. Nehaniv,et al. Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[48] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[49] Tamim Asfour,et al. Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[50] Jürgen Schmidhuber,et al. A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks , 2006 .

[51] Aude Billard,et al. Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[52] Michael Happold,et al. A Bayesian approach to imitation learning for robot navigation , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53] Aude Billard,et al. Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[54] Manuela M. Veloso,et al. Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[55] Aude Billard,et al. What is the Teacher"s Role in Robot Programming by Demonstration? - Toward Benchmarks for Improved Learning , 2007 .

[56] Brett Browning,et al. Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[57] Stefan Schaal,et al. Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[58] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[59] Peter Stone,et al. Graph-Based Domain Mapping for Transfer Learning in General Games , 2007, ECML.

[60] Julian Togelius,et al. Towards automatic personalised content creation for racing games , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[61] Heni Ben Amor,et al. Towards a Simulator for Imitation Learning with Kinesthetic Bootstrapping , 2008 .

[62] Antonio Bicchi,et al. An atlas of physical human-robot interaction , 2008 .

[63] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[64] David Silver,et al. High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.

[65] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[66] Shang-Jeng Tsai,et al. Optimal UAV flight path planning using skeletonization and Particle Swarm Optimizer , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[67] Aude Billard,et al. A framework integrating statistical and social cues to teach a humanoid robot new skills , 2008, ICRA 2008.

[68] Stefan Schaal,et al. Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[69] Betty J. Mohler,et al. Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[70] Manuela M. Veloso,et al. Teaching collaborative multi-robot tasks through demonstration , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[71] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[72] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[73] Tamim Asfour,et al. Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[74] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[75] Daniele Loiacono,et al. Learning drivers for TORCS through imitation using supervised methods , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[76] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[77] Araceli Sanchis,et al. Controller for TORCS created by imitation , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[78] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[79] M. Matarić,et al. The use of socially assistive robots in the design of intelligent cognitive therapies for people with dementia , 2009, 2009 IEEE International Conference on Rehabilitation Robotics.

[80] Henk Nijmeijer,et al. Robot Programming by Demonstration , 2010, SIMPAR.

[81] Bernard Gorman,et al. Imitation learning through games: theory, implementation and evaluation , 2009 .

[82] Jan Peters,et al. Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[83] Araceli Sanchis,et al. A human-like TORCS controller for the Simulated Car Racing Championship , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[84] Christoph H. Lampert,et al. Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.

[85] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[86] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[87] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[88] Takashi Minato,et al. Motor Learning for Flexible Joint Humanoid Robots using Physical Human-Robot Interaction , 2010 .

[89] Daobo Wang,et al. UAV path planning method based on ant colony optimization , 2010, 2010 Chinese Control and Decision Conference.

[90] Aude Billard,et al. Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[91] Hsien-I Lin,et al. Evaluation of human-robot arm movement imitation , 2011, 2011 8th Asian Control Conference (ASCC).

[92] Bandera Rubio,et al. Vision-based gesture recognition in a robot learning by imitation framework , 2011 .

[93] Stefan Schaal,et al. Learning variable impedance control , 2011, Int. J. Robotics Res..

[94] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[95] Stefan Schaal,et al. Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[96] Tao Geng,et al. Transferring human grasping synergies to a robot , 2011 .

[97] Julian Togelius,et al. Measuring Intelligence through Games , 2011, ArXiv.

[98] A. M. Alimi,et al. Prototyping a biped robot using an educational robotics kit , 2012, International Conference on Education and e-Learning Innovations.

[99] He He,et al. Imitation Learning by Coaching , 2012, NIPS.

[100] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[101] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[102] Nikolaos G. Tsagarakis,et al. Statistical dynamical systems for skills acquisition in humanoids , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[103] Takashi Minato,et al. Physical Human-Robot Interaction: Mutual Learning and Adaptation , 2012, IEEE Robotics & Automation Magazine.

[104] Toyoaki Nishida,et al. Fluid Imitation , 2012, Int. J. Soc. Robotics.

[105] Thomas G. Dietterich,et al. Active Imitation Learning via Reduction to I.I.D. Active Learning , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[106] P. Hingston. Believable Bots: Can Computers Play Like People? , 2012 .

[107] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[108] Andreas Vlachos,et al. An investigation of imitation learning algorithms for structured prediction , 2012, EWRL.

[109] Philip Hingston,et al. Believable Bots , 2012, Springer Berlin Heidelberg.

[110] Roger Bemelmans,et al. Socially assistive robots in elderly care: a systematic review into effects and effectiveness. , 2012, Journal of the American Medical Directors Association.

[111] Aude Billard,et al. Robot Learning from Failed Demonstrations , 2012, Int. J. Soc. Robotics.

[112] Sajjad Haider,et al. Teaching coordinated strategies to soccer robots via imitation , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[113] Carme Torras,et al. A robot learning from demonstration framework to perform force-based manipulation tasks , 2013, Intelligent Service Robotics.

[114] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[115] Josh C. Bongard,et al. Combining fitness-based search and user modeling in evolutionary robotics , 2013, GECCO '13.

[116] Kenneth O. Stanley,et al. Scalable multiagent learning through indirect encoding of policy geometry , 2013, Evol. Intell..

[117] Carme Torras,et al. Learning Collaborative Impedance-Based Robot Behaviors , 2013, AAAI.

[118] Carlos Balaguer,et al. A humanoid robot standing up through learning from demonstration using a multimodal reward function , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[119] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[120] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[121] Scott Niekum,et al. Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[122] Stefan Schaal,et al. From dynamic movement primitives to associative skill memories , 2013, Robotics Auton. Syst..

[123] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.

[124] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[125] Julian Togelius,et al. Imitating human playing styles in Super Mario Bros , 2013, Entertain. Comput..

[126] Xiaodong Li,et al. Learning a Super Mario controller from examples of human play , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[127] Heni Ben Amor,et al. Learning Two-Person Interaction Models for Responsive Synthetic Humanoids , 2014, J. Virtual Real. Broadcast..

[128] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[129] Olivier Sigaud,et al. Learning a repertoire of actions with deep neural networks , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[130] Matthew E. Taylor,et al. Policy Transfer using Reward Shaping , 2015, AAMAS.

[131] Huan Tan,et al. A Behavior Generation Framework for Robots to Learn from Demonstrations , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[132] Tsukasa Ogasawara,et al. Incremental learning of reach-to-grasp behavior: A PSO-based Inverse optimal control approach , 2015, 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[133] Darwin G. Caldwell,et al. Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[134] Markus Wulfmeier,et al. Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[135] William Curran,et al. Using PCA to Efficiently Represent State Spaces , 2015, ArXiv.

[136] Yudong Zhang,et al. A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications , 2015 .

[137] Justin Bayer,et al. Efficient movement representation by embedding Dynamic Movement Primitives in deep autoencoders , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[138] Amos J. Storkey,et al. Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[139] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[140] Yaochu Jin,et al. A social learning particle swarm optimization algorithm for scalable optimization , 2015, Inf. Sci..

[141] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[142] Sonia Chernova,et al. Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[143] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[144] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[145] Sergey Levine,et al. Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[146] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[147] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[148] Rouhollah Rahmatizadeh,et al. Learning Manipulation Trajectories Using Recurrent Neural Networks , 2016, ArXiv.

[149] Mansour A. Karkoub,et al. Humanoid Robot's Visual Imitation of 3-D Motion of a Human Subject Using Neural-Network-Based Inverse Kinematics , 2016, IEEE Systems Journal.

[150] Yisong Yue,et al. Smooth Imitation Learning for Online Sequence Prediction , 2016, ICML.

[151] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[152] Charlie C. L. Wang,et al. Motion Imitation Based on Sparsely Sampled Correspondence , 2017, J. Comput. Inf. Sci. Eng..

[153] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.