论文信息 - Learning to search: structured prediction techniques for imitation learning

Learning to search: structured prediction techniques for imitation learning

Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, time-consuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as neural network controlled driving platforms (Pomerleau, 1989), learning-based "programming by demonstration" has gained currency as a method to achieve intelligent robot behavior. Unfortunately, with highly structured algorithms at their core, modern robotic systems are hard to train using classical learning techniques. Rather than redefining robot architectures to accommodate existing learning algorithms, this thesis develops learning techniques that leverage the performance of modern robotic components. We begin with a discussion of a novel imitation learning framework we call Maximum Margin Planning which automates finding a cost function for optimal planning and control algorithms such as A*. In the linear setting, this framework has firm theoretical backing in the form of strong generalization and regret bounds. Further, we have developed practical nonlinear generalizations that are effective and efficient for real-world problems. This framework reduces imitation learning to a modern form of machine learning known as Maximum Margin Structured Classification (Taskar et al., 2005); these algorithms, therefore, apply both specifically to training existing state-of-the-art planners as well as broadly to solving a range of structured prediction problems of importance in learning and robotics. In difficult high-dimensional planning domains, such as those found in many manipulation problems, high-performance planning technology remains a topic of much research. We close with some recent work which moves toward simultaneously advancing this technology while retaining the learnability developed above. Throughout the thesis, we demonstrate our algorithms on a range of applications including overhead navigation, quadrupedal locomotion, heuristic learning, manipulation planning, grasp prediction, driver prediction, pedestrian prediction, optical character recognition, and LADAR classification.

Nathan Ratliff | J. Andrew Bagnell | Nathan D. Ratliff | J. Bagnell

[1] A. Wightman,et al. Mathematical Physics. , 1930, Nature.

[2] R. E. Kalman,et al. When Is a Linear Control System Optimal , 1964 .

[3] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[4] Naum Zuselevich Shor,et al. Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[5] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[6] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[7] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .

[8] Steven Dubowsky,et al. On computing the global time-optimal motions of robotic manipulators in the presence of obstacles , 1991, IEEE Trans. Robotics Autom..

[9] Stefan Schaal,et al. Open loop stable control strategies for robot juggling , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[10] Oussama Khatib,et al. Elastic bands: connecting path planning and control , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[11] Philip M. Long,et al. WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[14] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[15] B. Faverjon,et al. Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[16] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[17] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[20] E. Yaz. Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[21] Lydia E. Kavraki,et al. Probabilistic Roadmaps for Robot Path Planning , 1998 .

[22] Yong K. Hwang,et al. SANDROS: a dynamic graph search algorithm for motion planning , 1998, IEEE Trans. Robotics Autom..

[23] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[24] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[25] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .

[26] Vladimir J. Lumelsky,et al. Biped robot locomotion in scenes with unknown obstacles , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[27] Steven M. LaValle,et al. RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[28] Shun-ichi Amari,et al. Methods of information geometry , 2000 .

[29] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[31] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[32] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[33] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[34] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[35] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[36] Yoram Baram,et al. Manifold Stochastic Dynamics for Bayesian Learning , 1999, Neural Computation.

[37] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[38] B. Moor,et al. Mixed integer programming for multi-vehicle path planning , 2001, 2001 European Control Conference (ECC).

[39] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[40] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[41] Martin J. Wainwright,et al. Stochastic processes on graphs with cycles: geometric and variational approaches , 2002 .

[42] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[43] Henrik I. Christensen,et al. Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[44] J. Chestnutt,et al. Planning Biped Navigation Strategies in Complex Environments , 2003 .

[45] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.

[46] O. Brock,et al. Elastic Strips: A Framework for Motion Generation in Human Environments , 2002, Int. J. Robotics Res..

[47] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[48] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[49] Alexander J. Smola,et al. Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[50] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[51] Brian Roark,et al. Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[52] Joel A. Tropp,et al. Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[53] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[54] A. Moore,et al. Learning decisions: robustness, uncertainty, and approximation , 2004 .

[55] Ben Taskar,et al. Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[56] Gunnar Rätsch,et al. Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[57] Raffaello D'Andrea,et al. Iterative MILP methods for vehicle-control problems , 2005, IEEE Transactions on Robotics.

[58] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[59] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[60] Takeo Kanade,et al. Footstep Planning for the Honda ASIMO Humanoid , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[61] Ben Taskar,et al. Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[63] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[64] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[65] Ben Taskar,et al. Structured Prediction via the Extragradient Method , 2005, NIPS.

[66] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[67] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[68] Brett Browning,et al. Learning to Predict Driver Route and Destination Intent , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[69] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[70] Chaitanya Swamy,et al. An approximation scheme for stochastic linear programming and its application to stochastic integer programs , 2006, JACM.

[71] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[72] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .

[73] C.S. Ma,et al. MILP optimal path planning for real-time applications , 2006, 2006 American Control Conference.

[74] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[75] Mark H. Overmars,et al. Creating High-quality Roadmaps for Motion Planning in Virtual Environments , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[76] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[77] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[78] Richard Szeliski,et al. A Comparative Study of Energy Minimization Methods for Markov Random Fields , 2006, ECCV.

[79] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .

[80] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[81] T. Poggio,et al. Regularized Least-Squares Classification 133 In practice , although , 2007 .

[82] Pieter Abbeel,et al. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[83] Alexander J. Smola,et al. Bundle Methods for Machine Learning , 2007, NIPS.

[84] Siddhartha S. Srinivasa,et al. Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[85] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[86] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[87] J. Andrew Bagnell,et al. Kernel Conjugate Gradient for Fast Kernel Machines , 2007, IJCAI.

[88] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[89] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[90] Anind K. Dey,et al. Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[91] David Silver,et al. High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.

[92] Steven L. Waslander,et al. Tunnel-MILP: Path Planning with Sequential Convex Polytopes , 2008, AIAA Guidance, Navigation and Control Conference and Exhibit.

[93] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[94] Martial Hebert,et al. Directional Associative Markov Network for 3-D Point Cloud Classification , 2008 .

[95] Elisa Ricci,et al. Large Margin Methods for Structured Output Prediction , 2008, Computational Intelligence Paradigms.

[96] John Krumm. Number 2008-01-0195 A Markov Model for Driver Turn Prediction , 2008 .

[97] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[98] Siddhartha S. Srinivasa,et al. CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[99] Siddhartha S. Srinivasa,et al. Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[100] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[101] Martial Hebert,et al. Contextual classification with functional Max-Margin Markov Networks , 2009, CVPR.

[102] T. Banchoff,et al. Differential Geometry of Curves and Surfaces , 2010 .

[103] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[104] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..