Stability-Certified Reinforcement Learning via Spectral Normalization

In this article, two types of methods from different perspectives based on spectral normalization are described for ensuring the stability of the system controlled by a neural network. The first one is that the L2 gain of the feedback system is bounded less than 1 to satisfy the stability condition derived from the small-gain theorem. While explicitly including the stability condition, the first method may provide an insufficient performance on the neural network controller due to its strict stability condition. To overcome this difficulty, the second one is proposed, which improves the performance while ensuring the local stability with a larger region of attraction. In the second method, the stability is ensured by solving linear matrix inequalities after training the neural network controller. The spectral normalization proposed in this article improves the feasibility of the a-posteriori stability test by constructing tighter local sectors. The numerical experiments show that the second method provides enough performance compared with the first one while ensuring enough stability compared with the existing reinforcement learning algorithms.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Luca Daniel,et al.  Verification of Neural Network Control Policy Under Persistent Adversarial Perturbation , 2019, ArXiv.

[3]  Douglas C. Hittle,et al.  Robust Reinforcement Learning Control Using Integral Quadratic Constraints for Recurrent Neural Networks , 2007, IEEE Transactions on Neural Networks.

[4]  Murat Arcak,et al.  Stability Analysis using Quadratic Constraints for Systems with Neural Network Controllers , 2020, ArXiv.

[5]  Azer Bestavros,et al.  Neuroflight: Next Generation Flight Control Firmware , 2019, ArXiv.

[6]  Christine M. Belcastro,et al.  Development of a Dynamically Scaled Generic Transport Model Testbed for Flight Research Experiments , 2004 .

[7]  Andreas Krause,et al.  The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems , 2018, CoRL.

[8]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Javad Lavaei,et al.  Stability-Certified Reinforcement Learning: A Control-Theoretic Perspective , 2018, IEEE Access.

[11]  Yuan Tian,et al.  H∞ Model-free Reinforcement Learning with Robust Stability Guarantee , 2019, ArXiv.

[12]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[13]  Jun Jin,et al.  Offline Learning of Counterfactual Perception as Prediction for Real-World Robotic Reinforcement Learning , 2020, ArXiv.

[14]  T. Shimomura,et al.  Gain-scheduled control under common Lyapunov functions: conservatism revisited , 2005, Proceedings of the 2005, American Control Conference, 2005..

[15]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[16]  Jun Wang,et al.  Actor-Critic Reinforcement Learning for Control With Stability Guarantee , 2020, IEEE Robotics and Automation Letters.

[17]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[21]  Bálint Gyires-Tóth,et al.  Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[22]  Kenji Sawada,et al.  On the Worst Disturbance of Airplane Longitudinal Motion using the Generic Transport Model , 2019 .

[23]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[24]  Henryk Michalewski,et al.  Simulation-Based Reinforcement Learning for Real-World Autonomous Driving , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Tamer Basar,et al.  Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence , 2020, L4DC.

[26]  Azer Bestavros,et al.  Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..

[27]  Priya L. Donti,et al.  Enforcing robust control guarantees within neural network policies , 2020, ArXiv.

[28]  Steven Seidman,et al.  A synthesis of reinforcement learning and robust control theory , 2000 .

[29]  Shaoshuai Mou,et al.  Neural Certificates for Safe Control Policies , 2020, ArXiv.

[30]  Sicun Gao,et al.  Neural Lyapunov Control , 2020, NeurIPS.

[31]  Hidenao Iwane,et al.  Control Approach Combining Reinforcement Learning and Model-Based Control , 2019, 2019 12th Asian Control Conference (ASCC).