Complete stability analysis of a heuristic ADP control design

This paper provides new stability results for Action-Dependent Heuristic Dynamic Programming (ADHDP), using a control algorithm that iteratively improves an internal model of the external world in the autonomous system based on its continuous interaction with the environment. We extend previous results by ADHDP control to the case of general multi-layer neural networks with deep learning across all layers. In particular, we show that the introduced control approach is uniformly ultimately bounded (UUB) under specific conditions on the learning rates, without explicit constraints on the temporal discount factor. We demonstrate the benefit of our results to the control of linear and nonlinear systems, including the cart-pole balancing problem. Our results show significantly improved learning and control performance as compared to the state-of-art.

[1]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[2]  Andrew R. Barron,et al.  Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.

[3]  George G. Lendaris Adaptive dynamic programming approach to experience-based systems identification and control , 2009, Neural Networks.

[4]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[7]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[8]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[9]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[10]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[11]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[13]  Bin Yang,et al.  On stability of a dynamical system , 1995 .

[14]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[15]  Huaguang Zhang,et al.  Nearly Optimal Control Scheme Using Adaptive Dynamic Programming Based on Generalized Fuzzy Hyperbolic Model , 2013 .

[16]  Mario J. Valenti Approximate dynamic programming with applications in multi-agent systems , 2007 .

[17]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[18]  J. P. Lasalle The stability of dynamical systems , 1976 .

[19]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[20]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[21]  Frank L. Lewis,et al.  Generalized Policy Iteration for continuous-time systems , 2009, 2009 International Joint Conference on Neural Networks.

[22]  Robert Kozma,et al.  Improved stability criteria of ADP control for efficient context-aware decision support systems , 2013, 2013 International Joint Conference on Awareness Science and Technology & Ubi-Media Computing (iCAST 2013 & UMEDIA 2013).

[23]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[24]  Jagannathan Sarangapani,et al.  Neural Network Control of Nonlinear Discrete-Time Systems , 2018 .

[25]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[26]  Jennie Si,et al.  Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .

[27]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[28]  Paul J. Werbos,et al.  Stable adaptive control using new critic designs , 1998, Other Conferences.