On the Stability Analysis of Deep Neural Network Representations of an Optimal State Feedback

Recent work have shown how the optimal state-feedback, obtained as the solution to the Hamilton-Jacobi-Bellman equations, can be approximated for several nonlinear, deterministic systems by deep neural networks. When imitation (supervised) learning is used to train the neural network on optimal state-action pairs, for instance as derived by applying Pontryagin's theory of optimal processes, the resulting model is referred here as the guidance and control network. In this work, we analyze the stability of nonlinear and deterministic systems controlled by such networks. We then propose a method utilising differential algebraic techniques and high-order Taylor maps to gain information on the stability of the neurocontrolled state trajectories. We exemplify the proposed methods in the case of the two-dimensional dynamics of a quadcopter controlled to reach the origin and we study how different architectures of the guidance and control network affect the stability of the target equilibrium point and the stability margins to time delay. Moreover, we show how to study the robustness to initial conditions of a nominal trajectory, using a Taylor representation of the neurocontrolled neighbouring trajectories.

[1]  Dario Izzo,et al.  Aggressive Online Control of a Quadrotor via Deep Network Representations of Optimality Principles , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[4]  P. Kokotovic,et al.  Inverse Optimality in Robust Stabilization , 1996 .

[5]  Sergey Levine,et al.  Exploring Deep and Recurrent Architectures for Optimal Control , 2013, ArXiv.

[6]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[7]  Mariella Graziano,et al.  Autonomous Low‐Thrust Guidance: Application to SMART‐1 and BepiColombo , 2004, Annals of the New York Academy of Sciences.

[8]  P. W. Hawkes,et al.  Modern map methods in particle beam physics , 1999 .

[9]  Francesco Topputo,et al.  A Recurrent Deep Architecture for Quasi-Optimal Feedback Guidance in Planetary Landing , 2018 .

[10]  Jongho Shin Adaptive Dynamic Surface Control for a Hypersonic Aircraft Using Neural Networks , 2017, IEEE Transactions on Aerospace and Electronic Systems.

[11]  Martin Berz,et al.  Rigorous integration of flows and ODEs using taylor models , 2009, SNC '09.

[12]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[13]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[14]  M. Berz Differential Algebraic Description of Beam Dynamics to Very High Orders , 1988 .

[15]  W. Marsden I and J , 2012 .

[16]  M. Berz,et al.  Asteroid close encounters characterization using differential algebra: the case of Apophis , 2010 .

[17]  Dario Izzo,et al.  Learning the optimal state-feedback via supervised imitation learning , 2019, Astrodynamics.

[18]  Dario Izzo,et al.  Real-time optimal control via Deep Neural Networks: study on landing problems , 2016, ArXiv.

[19]  V. V. Chalam Model Reference Adaptive Systems , 2017 .

[20]  P. Lions,et al.  User’s guide to viscosity solutions of second order partial differential equations , 1992, math/9207212.

[21]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[22]  T. Hrycej Stability and equilibrium points in neurocontrol , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[23]  Martin Berz,et al.  Verified Integration of ODEs and Flows Using Differential Algebraic Methods on High-Order Taylor Models , 1998, Reliab. Comput..

[24]  Lin Cheng,et al.  Real-Time Optimal Control for Spacecraft Orbit Transfer via Multiscale Deep Neural Networks , 2019, IEEE Transactions on Aerospace and Electronic Systems.

[25]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[26]  Sarangapani Jagannathan,et al.  Neural Network-Based Optimal Adaptive Output Feedback Control of a Helicopter UAV , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[28]  Frank L. Lewis,et al.  Neural Networks in Feedback Control Systems , 2015 .

[29]  Michael Athans,et al.  Approximating optimal state feedback using neural networks , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[30]  Albert C. J. Luo Periodic Flows to Chaos in Time-Delay Systems , 2016 .

[31]  James E. Steck,et al.  Use of Hopfield neural networks in optimal guidance , 1994 .

[32]  L. S. Pontryagin,et al.  Mathematical Theory of Optimal Processes , 1962 .