论文信息 - A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference

A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference

Many state-of-the-art approaches to trajectory optimization and optimal control are intimately related to standard Newton methods. For researchers that work in the intersections of machine learning, robotics, control, and optimization, such relations are highly relevant but sometimes hard to see across disciplines, due also to the different notations and conventions used in the disciplines. The aim of this tutorial is to introduce to constrained trajectory optimization in a manner that allows us to establish these relations. We consider a basic but general formalization of the problem and discuss the structure of Newton steps in this setting. The computation of Newton steps can then be related to dynamic programming, establishing relations to DDP, iLQG, and AICO. We can also clarify how inverting a banded symmetric matrix is related to dynamic programming as well as message passing in Markov chains and factor graphs. Further, for a machine learner, path optimization and Gaussian Processes seem intuitively related problems. We establish such a relation and show how to solve a Gaussian Process-regularized path optimization problem efficiently. Further topics include how to derive an optimal controller around the path, model predictive control in constrained k-order control processes, and the pullback metric interpretation of the Gauss–Newton approximation.

Marc Toussaint

[1] R Bellman,et al. DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[2] D. Mayne. A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[3] P. Toint,et al. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds , 1991 .

[4] L. Liao,et al. Advantages of Differential Dynamic Programming Over Newton''s Method for Discrete-time Optimal Control Problems , 1992 .

[5] Oskar von Stryk,et al. Direct and indirect methods for trajectory optimization , 1992, Ann. Oper. Res..

[6] Jeffrey K. Uhlmann,et al. New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[7] J. Betts. Survey of Numerical Methods for Trajectory Optimization , 1998 .

[8] R. Robinett,et al. Dynamic Programming Method for Constrained Discrete-Time Optimal Control , 1999 .

[9] Brendan J. Frey,et al. Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[10] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[11] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[12] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13] X. Jin. Factor graphs and the Sum-Product Algorithm , 2002 .

[14] Alberto Bemporad,et al. The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[15] Henrik I. Christensen,et al. Graphical SLAM - a self-correcting map , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[17] Sebastian Thrun,et al. The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[18] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[19] Marc Toussaint,et al. Model-free reinforcement learning as mixture learning , 2009, ICML '09.

[20] Siddhartha S. Srinivasa,et al. CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[21] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[22] Hans Joachim Ferreau,et al. Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation , 2009 .

[23] Pros and Cons of truncated Gaussian EP in the context of Approximate Inference Control , 2009 .

[24] Yuval Tassa,et al. Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25] Wolfram Burgard,et al. G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26] Stefan Schaal,et al. STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[28] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.

[29] F. Dellaert. Factor Graphs and GTSAM: A Hands-on Introduction , 2012 .

[30] Pieter Abbeel,et al. Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[31] Marc Toussaint. A Novel Augmented Lagrangian Approach for Inequalities and Convergent Any-Time Non-Central Updates , 2014, 1412.4329.

[32] Marc Toussaint,et al. Newton methods for k-order Markov Constrained Motion Problems , 2014, ArXiv.

[33] Marc Toussaint,et al. Understanding the geometry of workspace obstacles in Motion Optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34] Emanuel Todorov,et al. Physically consistent state estimation and system identification for contacts , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[35] Byron Boots,et al. Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs , 2016, Robotics: Science and Systems.

[36] Peter Englert,et al. Inverse KKT: Learning cost functions of manipulation tasks from demonstrations , 2017, ISRR.