Information-theoretic stochastic optimal control via incremental sampling-based algorithms

This paper considers optimal control of dynamical systems which are represented by nonlinear stochastic differential equations. It is well-known that the optimal control policy for this problem can be obtained as a function of a value function that satisfies a nonlinear partial differential equation, namely, the Hamilton-Jacobi-Bellman equation. This nonlinear PDE must be solved backwards in time, and this computation is intractable for large scale systems. Under certain assumptions, and after applying a logarithmic transformation, an alternative characterization of the optimal policy can be given in terms of a path integral. Path Integral (PI) based control methods have recently been shown to provide elegant solutions to a broad class of stochastic optimal control problems. One of the implementation challenges with this formalism is the computation of the expectation of a cost functional over the trajectories of the unforced dynamics. Computing such expectation over trajectories that are sampled uniformly may induce numerical instabilities due to the exponentiation of the cost. Therefore, sampling of low-cost trajectories is essential for the practical implementation of PI-based methods. In this paper, we use incremental sampling-based algorithms to sample useful trajectories from the unforced system dynamics, and make a novel connection between Rapidly-exploring Random Trees (RRTs) and information-theoretic stochastic optimal control. We show the results from the numerical implementation of the proposed approach to several examples.

[1]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[2]  Evangelos Theodorou,et al.  Relative entropy and free energy dualities: Connections to Path Integral and KL control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[3]  Panagiotis Tsiotras,et al.  Use of relaxation methods in sampling-based algorithms for optimal motion planning , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[5]  W. Fleming,et al.  Risk-Sensitive Control on an Infinite Time Horizon , 1995 .

[6]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[7]  Stefan Schaal,et al.  Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[8]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[9]  Emilio Frazzoli,et al.  Optimal kinodynamic motion planning using incremental sampling-based methods , 2010, 49th IEEE Conference on Decision and Control (CDC).

[10]  J. How,et al.  Chance Constrained RRT for Probabilistic Robustness to Environmental Uncertainty , 2010 .

[11]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[12]  H. Kushner,et al.  A Monte Carlo method for sensitivity analysis and parametric optimization of nonlinear stochastic systems , 1991 .

[13]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[14]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[15]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[16]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[17]  Evangelos Theodorou,et al.  Time varying nonlinear Policy Gradients , 2013, 52nd IEEE Conference on Decision and Control.

[18]  Evangelos A. Theodorou,et al.  From information theoretic dualities to Path Integral and Kullback Leibler control : Continuous and Discrete Time formulations , 2013 .

[19]  Howie Choset,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[20]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[21]  Jun Morimoto,et al.  Phase-dependent trajectory optimization for CPG-based biped walking using path integral reinforcement learning , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[22]  Wolfgang J. Runggaldier,et al.  Connections between stochastic control and dynamic games , 1996, Math. Control. Signals Syst..

[23]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[24]  Jessica Fuerst,et al.  Stochastic Differential Equations And Applications , 2016 .