Approximating optimal feedback controllers of finite horizon control problems using hierarchical tensor formats

Controlling systems of ordinary differential equations (ODEs) is ubiquitous in science and engineering. For finding an optimal feedback controller, the value function and associated fundamental equations such as the Bellman equation and the Hamilton-Jacobi-Bellman (HJB) equation are essential. The numerical treatment of these equations poses formidable challenges due to their non-linearity and their (possibly) highdimensionality. In this paper we consider a finite horizon control system with associated Bellman equation. After a time-discretization, we obtain a sequence of short time horizon problems which we call local optimal control problems. For solving the local optimal control problems we apply two different methods, one being the well-known policy iteration, where a fixed-point iteration is required for every time step. The other algorithm borrows ideas from Model Predictive Control (MPC), by solving the local optimal control problem via open-loop control methods on a short time horizon, allowing us to replace the fixed-point iteration by an adjoint method. For high-dimensional systems we apply low rank hierarchical tensor product approximation/tree-based tensor formats, in particular tensor trains (TT tensors) and multi-polynomials, together with high-dimensional quadrature, e.g. Monte-Carlo. We prove a linear error propagation with respect to the time discretization and give numerical evidence by controlling a diffusion equation with unstable reaction term and an Allen-Kahn equation. 1 ar X iv :2 10 4. 06 10 8v 1 [ m at h. O C ] 1 3 A pr 2 02 1

[1]  Lorenz Richter,et al.  Solving high-dimensional parabolic PDEs using the tensor train format , 2021, ICML.

[2]  Wolfgang Hackbusch,et al.  Numerical tensor calculus* , 2014, Acta Numerica.

[3]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[4]  Wei Kang,et al.  Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and HJB equations , 2015, Computational Optimization and Applications.

[5]  Reinhold Schneider,et al.  Tensor Spaces and Hierarchical Tensor Representations , 2014 .

[6]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[7]  B. Bouchard,et al.  Discrete-time approximation and Monte-Carlo simulation of backward stochastic differential equations , 2004 .

[8]  Tingwei Meng,et al.  Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures , 2019 .

[9]  Reinhold Schneider,et al.  Tensor Networks and Hierarchical Tensors for the Solution of High-Dimensional Partial Differential Equations , 2016, Foundations of Computational Mathematics.

[10]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[11]  M. Falcone,et al.  Semi-Lagrangian Approximation Schemes for Linear and Hamilton-Jacobi Equations , 2014 .

[12]  G. Martin,et al.  Nonlinear model predictive control , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[13]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[14]  Reinhold Schneider,et al.  The Alternating Linear Scheme for Tensor Optimization in the Tensor Train Format , 2012, SIAM J. Sci. Comput..

[15]  Karl Kunisch,et al.  Tensor Decompositions for High-dimensional Hamilton-Jacobi-Bellman Equations , 2019 .

[16]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..

[17]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[18]  M. Bardi,et al.  Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations , 1997 .

[19]  M. Falcone A numerical approach to the infinite horizon problem of deterministic control theory , 1987 .

[20]  Karl Kunisch,et al.  Polynomial Approximation of High-Dimensional Hamilton-Jacobi-Bellman Equations and Applications to Feedback Control of Semilinear Parabolic PDEs , 2017, SIAM J. Sci. Comput..

[22]  Winfried Sickel,et al.  Tensor products of Sobolev-Besov spaces and applications to approximation from the hyperbolic cross , 2009, J. Approx. Theory.

[23]  E Weinan,et al.  Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations , 2017, Communications in Mathematics and Statistics.

[24]  H. Pham On some recent aspects of stochastic control and their applications , 2005, math/0509711.

[25]  F. Verstraete,et al.  Tensor product methods and entanglement optimization for ab initio quantum chemistry , 2014, 1412.5829.

[26]  Yuval Tassa,et al.  Value function approximation and model predictive control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[27]  Eugene E. Tyrtyshnikov,et al.  Breaking the Curse of Dimensionality, Or How to Use SVD in Many Dimensions , 2009, SIAM J. Sci. Comput..

[28]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[29]  Reinhold Schneider,et al.  On manifolds of tensors of fixed TT-rank , 2012, Numerische Mathematik.

[30]  D. Kleinman,et al.  An easy way to stabilize a linear constant system , 1970 .

[31]  C. G. Lee,et al.  Optimal control approximations for trainable manipulators , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[32]  Kristian Debrabant,et al.  Semi-Lagrangian schemes for linear and fully non-linear Hamilton-Jacobi-Bellman equations , 2014 .

[33]  R. Herzog,et al.  Algorithms for PDE‐constrained optimization , 2010 .

[34]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[35]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[36]  Stéphane Gaubert,et al.  Convergence analysis of the Max-Plus Finite Element Method for Solving Deterministic Optimal Control Problems , 2008, 2008 47th IEEE Conference on Decision and Control.

[37]  Dante Kalise,et al.  Optimal control : novel directions and applications , 2017 .

[38]  R. Schneider,et al.  Approximative Policy Iteration for Exit Time Feedback Control Problems Driven by Stochastic Differential Equations using Tensor Train Format , 2020, Multiscale Model. Simul..

[39]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[40]  R. Schneider,et al.  Approximating the Stationary Hamilton-Jacobi-Bellman Equation by Hierarchical Tensor Products , 2019 .

[41]  S. Karbassi,et al.  Application of variational iteration method for Hamilton–Jacobi–Bellman equations , 2013 .

[42]  Martino Bardi,et al.  On the Bellman equation for some unbounded control problems , 1997 .

[43]  Alessandro Alla,et al.  A HJB-POD approach for the control of nonlinear PDEs on a tree structure , 2019, Applied Numerical Mathematics.

[44]  Piero Lanucara,et al.  A splitting algorithm for Hamilton-Jacobi-Bellman equations , 1992 .

[45]  J. Landsberg Tensors: Geometry and Applications , 2011 .

[46]  Reinhold Schneider,et al.  Adaptive stochastic Galerkin FEM for lognormal coefficients in hierarchical tensor representations , 2018, Numerische Mathematik.

[47]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[48]  Jianfeng Lu,et al.  Actor-Critic Method for High Dimensional Static Hamilton-Jacobi-Bellman Partial Differential Equations based on Neural Networks , 2021, SIAM J. Sci. Comput..

[49]  Sertac Karaman,et al.  High-dimensional stochastic optimal control using continuous tensor decompositions , 2016, Int. J. Robotics Res..

[50]  W. Hackbusch,et al.  A New Scheme for the Tensor Representation , 2009 .

[51]  Kazufumi Ito,et al.  A neural network based policy iteration algorithm with global H2-superlinear convergence for stochastic games on domains , 2019, Found. Comput. Math..

[52]  R. Beard,et al.  Numerically efficient approximations to the Hamilton-Jacobi-Bellman equation , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).

[53]  Lorenz Richter,et al.  Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space , 2020, ArXiv.

[54]  W. Hager,et al.  Optimality, stability, and convergence in nonlinear control , 1995 .

[55]  Harvey Thomas Banks,et al.  Feedback Control Methodologies for Nonlinear Systems , 2000 .

[56]  Qi Gong,et al.  Adaptive Deep Learning for High Dimensional Hamilton-Jacobi-Bellman Equations , 2019, SIAM J. Sci. Comput..

[57]  E. Gobet,et al.  A regression-based Monte Carlo method to solve backward stochastic differential equations , 2005, math/0508491.

[58]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[59]  Alessandro Alla,et al.  An Efficient Policy Iteration Algorithm for Dynamic Programming Equations , 2013, SIAM J. Sci. Comput..

[60]  Karl Kunisch,et al.  Optimal Feedback Law Recovery by Gradient-Augmented Sparse Polynomial Regression , 2021, J. Mach. Learn. Res..

[61]  Joel W. Burdick,et al.  Linear Hamilton Jacobi Bellman Equations in high dimensions , 2014, 53rd IEEE Conference on Decision and Control.