Feynman-Kac Neural Network Architectures for Stochastic Control Using Second-Order FBSDE Theory

We present a deep recurrent neural network architecture to solve a class of stochastic optimal control problems described by fully nonlinear Hamilton Jacobi Bellman partial differential equations. Such PDEs arise when considering stochastic dynamics characterized by uncertainties that are additive, state dependent, and control multiplicative. Stochastic models with these characteristics are important in computational neuroscience, biology, finance, and aerospace systems and provide a more accurate representation of actuation than models with only additive uncertainty. Previous literature has established the inadequacy of the linear HJB theory for such problems, so instead, methods relying on the generalized version of the Feynman-Kac lemma have been proposed resulting in a system of second-order Forward-Backward SDEs. However, so far, these methods suffer from compounding errors resulting in lack of scalability. In this paper, we propose a deep learning based algorithm that leverages the second-order FBSDE representation and LSTM-based recurrent neural networks to not only solve such stochastic optimal control problems but also overcome the problems faced by traditional approaches, including scalability. The resulting control algorithm is tested on a high-dimensional linear system and three nonlinear systems from robotics and biomechanics in simulation to demonstrate feasibility and out-performance against previous methods.

[1]  Y. Phillis Controller design of systems with multiplicative noise , 1985 .

[2]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[5]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[6]  S. Peng,et al.  Adapted solution of a backward stochastic differential equation , 1990 .

[7]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[8]  Panagiotis Tsiotras,et al.  Stochastic Differential Games: A Sampling Approach via FBSDEs , 2018, Dynamic Games and Applications.

[9]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[10]  Evangelos A. Theodorou,et al.  Stochastic Variational Integrators for System Propagation and Linearization , 2015 .

[11]  Evangelos Theodorou,et al.  Stochastic optimal control via forward and backward stochastic differential equations and importance sampling , 2018, Autom..

[12]  Evangelos Theodorou,et al.  Stochastic control of systems with control multiplicative noise using second order FBSDEs , 2017, 2017 American Control Conference (ACC).

[13]  Evangelos Theodorou,et al.  Stochastic L1-optimal control via forward and backward sampling , 2018, Syst. Control. Lett..

[14]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[15]  Emanuel Todorov,et al.  Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system , 2007, Int. J. Control.

[16]  Yuval Tassa,et al.  Stochastic Differential Dynamic Programming , 2010, Proceedings of the 2010 American Control Conference.

[17]  J.A. Primbs,et al.  Portfolio Optimization Applications of Stochastic Receding Horizon Control , 2007, 2007 American Control Conference.

[18]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[19]  Evangelos A. Theodorou,et al.  Learning Deep Stochastic Optimal Control Policies Using Forward-Backward SDEs , 2019, Robotics: Science and Systems.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  M. James Controlled markov processes and viscosity solutions , 1994 .

[22]  Alfio Borzì,et al.  A Fokker-Planck control framework for multidimensional stochastic processes , 2013, J. Comput. Appl. Math..

[23]  P. McLane Optimal stochastic control of linear systems with state- and control-dependent disturbances , 1971 .

[24]  Evangelos A. Theodorou,et al.  Deep 2FBSDEs for Systems with Control Multiplicative Noise , 2019, ArXiv.

[25]  A. Borzì,et al.  Optimal control of probability density functions of stochastic processes , 2010 .

[26]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Sertac Karaman,et al.  Efficient High-Dimensional Stochastic Optimal Motion Control using Tensor-Train Decomposition , 2015, Robotics: Science and Systems.

[28]  Maziar Raissi,et al.  Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations , 2018, ArXiv.

[29]  Evangelos Theodorou,et al.  Learning optimal control via forward and backward stochastic differential equations , 2015, 2016 American Control Conference (ACC).

[30]  E Weinan,et al.  Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Second-order Backward Stochastic Differential Equations , 2017, J. Nonlinear Sci..

[31]  Emanuel Todorov,et al.  Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System , 2005, Neural Computation.

[32]  S. Shreve Stochastic Calculus for Finance II: Continuous-Time Models , 2010 .

[33]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[34]  S. Peng,et al.  Backward Stochastic Differential Equations in Finance , 1997 .

[35]  S. Vijayakumar,et al.  A Computational Model of Limb Impedance Control Based on Principles of Internal Model Uncertainty , 2010, PloS one.

[36]  Xiongzhi Chen Brownian Motion and Stochastic Calculus , 2008 .

[37]  Arnulf Jentzen,et al.  Solving high-dimensional partial differential equations using deep learning , 2017, Proceedings of the National Academy of Sciences.

[38]  Emilio Frazzoli,et al.  An incremental sampling-based algorithm for stochastic optimal control , 2012, 2012 IEEE International Conference on Robotics and Automation.