An Invitation to Imitation

Abstract : Imitation learning is the study of algorithms that attempt to improve performance by mimicking a teacher's decisions and behaviors. Such techniques promise to enable effective programming by demonstration to automate tasks, such as driving, that people can demonstrate but find difficult to hand program. This work represents a summary from a very personal perspective of research on computationally effective methods for learning to imitate behavior. I intend it to serve two audiences: to engage machine learning experts in the challenges of imitation learning and the interesting theoretical and practical distinctions with more familiar frameworks like statistical supervised learning theory; and equally, to make the frameworks and tools available for imitation learning more broadly appreciated by roboticists and experts in applied artificial intelligence.

[1]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Larry D. Jackel,et al.  The DARPA LAGR program: Goals, challenges, methodology, and phase I results , 2006, J. Field Robotics.

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[6]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[8]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[9]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[10]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[11]  J. Andrew Bagnell,et al.  Robust Supervised Learning , 2005, AAAI.

[12]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[13]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[14]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[15]  J. Bagnell,et al.  Stabilizing Human Control Strategies through Reinforcement Learning , 1999 .

[16]  Jan Peters,et al.  Imitation and Reinforcement Learning: Practical Algorithms for Motor Primitives in Robotics , 2010 .

[17]  He He,et al.  Imitation Learning by Coaching , 2012, NIPS.

[18]  J. Andrew Bagnell,et al.  Generalized Boosting Algorithms for Convex Optimization , 2011, ICML.

[19]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[20]  Yisong Yue,et al.  Learning Policies for Contextual Submodular Prediction , 2013, ICML.

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[23]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[24]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[25]  Anind K. Dey,et al.  Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[26]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[27]  Anind K. Dey,et al.  The Principle of Maximum Causal Entropy for Estimating Interacting Processes , 2013, IEEE Transactions on Information Theory.

[28]  Siddhartha S. Srinivasa,et al.  Formalizing Assistive Teleoperation , 2012, Robotics: Science and Systems.

[29]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[30]  Stefan Schaal,et al.  Learning, planning, and control for quadruped locomotion over challenging terrain , 2011, Int. J. Robotics Res..

[31]  J. Andrew Bagnell,et al.  Approximate MaxEnt Inverse Optimal Control and Its Application for Mental Simulation of Human Interactions , 2015, AAAI.

[32]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[33]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[34]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[35]  John Rust,et al.  Structural estimation of markov decision processes , 1986 .

[36]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[37]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[38]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[39]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[40]  Siddhartha S. Srinivasa,et al.  Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[41]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[42]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[43]  Martial Hebert,et al.  Intelligent Unmanned Ground Vehicles: Autonomous Navigation Research at Carnegie Mellon , 1997 .

[44]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[45]  J. Peters,et al.  Imitation and Reinforcement Learning â Practical Algorithms for Motor Primitive Learning in Robotics , 2010 .

[46]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[47]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[48]  Nathan Ratliff,et al.  Learning to search: structured prediction techniques for imitation learning , 2009 .

[49]  David Silver,et al.  Learning Preference Models for Autonomous Mobile Robots in Complex Domains , 2010 .

[50]  Christian Laugier,et al.  The International Journal of Robotics Research (IJRR) - Special issue on ``Field and Service Robotics '' , 2009 .

[51]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[52]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[53]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[54]  Anind K. Dey,et al.  Probabilistic pointing target prediction via inverse optimal control , 2012, IUI '12.

[55]  Nicholas Roy,et al.  Autonomous Flight in Unknown Indoor Environments , 2009 .

[56]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[57]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[58]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[59]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[60]  Jessica K. Hodgins,et al.  Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[61]  Wolfram Burgard,et al.  Learning to predict trajectories of cooperatively navigating agents , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[62]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[63]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[64]  Anthony Stentz,et al.  Online adaptive rough-terrain navigation vegetation , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[65]  Samuel T. Waters,et al.  American Association for Artificial Intelligence (AAAI) , 1988 .

[66]  Joelle Pineau,et al.  Learning from Limited Demonstrations , 2013, NIPS.

[67]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[68]  Christopher G. Atkeson,et al.  Optimization and learning for rough terrain legged locomotion , 2011, Int. J. Robotics Res..

[69]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[70]  L. Ljung Convergence analysis of parametric identification methods , 1978 .

[71]  Kevin Waugh,et al.  Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.