A Neural Network Approach Applied to Multi-Agent Optimal Control

We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for moderate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls’ optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approach’s effectiveness on a 150-dimensional multi-agent problem with obstacles.

[1]  Arnulf Jentzen,et al.  Solving high-dimensional partial differential equations using deep learning , 2017, Proceedings of the National Academy of Sciences.

[2]  E Weinan,et al.  Deep Learning Approximation for Stochastic Control Problems , 2016, ArXiv.

[3]  Stanley Osher,et al.  Time-Optimal Collaborative Guidance Using the Generalized Hopf Formula , 2018, IEEE Control Systems Letters.

[4]  Adam M. Oberman,et al.  How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization , 2020, ICML.

[5]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[6]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[7]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[8]  L. S. Pontryagin,et al.  Mathematical Theory of Optimal Processes , 1962 .

[9]  Lars Ruthotto,et al.  Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows , 2020, ArXiv.

[10]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[11]  Gaurav S. Sukhatme,et al.  Trajectory Planning for Quadrotor Swarms , 2018, IEEE Transactions on Robotics.

[12]  Xingjian Li,et al.  OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport , 2020, ArXiv.

[13]  Wei Kang,et al.  Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and HJB equations , 2015, Computational Optimization and Applications.

[14]  P. Cannarsa,et al.  Semiconcave Functions, Hamilton-Jacobi Equations, and Optimal Control , 2004 .

[15]  S. Osher,et al.  Algorithms for overcoming the curse of dimensionality for certain Hamilton–Jacobi equations arising in control theory and elsewhere , 2016, Research in the Mathematical Sciences.

[16]  M. Bardi,et al.  Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations , 1997 .

[17]  Wei Kang,et al.  Algorithms of Data Development For Deep Learning and Feedback Design , 2019 .

[18]  M. James Controlled markov processes and viscosity solutions , 1994 .

[19]  E Weinan,et al.  Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations , 2017, Communications in Mathematics and Statistics.

[20]  Soon-Jo Chung,et al.  GLAS: Global-to-Local Safe Autonomy Synthesis for Multi-Robot Motion Planning With End-to-End Learning , 2020, IEEE Robotics and Automation Letters.

[21]  Karl Kunisch,et al.  Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation , 2020, ESAIM: Control, Optimisation and Calculus of Variations.

[22]  S. Osher,et al.  High-order essentially nonsocillatory schemes for Hamilton-Jacobi equations , 1990 .

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  S. Osher,et al.  A Splitting Method For Overcoming the Curse of Dimensionality in Hamilton-Jacobi Equations Arising from Nonlinear Optimal Control and Differential Games with Applications to Trajectory Generation , 2018, 1803.01215.

[25]  Levon Nurbekyan,et al.  A machine learning framework for solving high-dimensional mean field game and mean field control problems , 2020, Proceedings of the National Academy of Sciences.

[26]  Qi Gong,et al.  Adaptive Deep Learning for High Dimensional Hamilton-Jacobi-Bellman Equations , 2019, SIAM J. Sci. Comput..

[27]  George Em Karniadakis,et al.  Potential Flow Generator With L2 Optimal Transport Regularity for Generative Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Wang Hai-bing,et al.  High-order essentially non-oscillatory schemes for Hamilton-Jacobi equations , 2006 .

[30]  Lorenz Richter,et al.  Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space , 2020, ArXiv.