Revisiting Approximate Dynamic Programming and its Convergence

Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  C. A. Desoer,et al.  Nonlinear Systems Analysis , 1978 .

[3]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[4]  Ali Heydari,et al.  Optimal switching between autonomous subsystems , 2014, J. Frankl. Inst..

[5]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[6]  S. Sastry Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[7]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[8]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[9]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[10]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[11]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[13]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[14]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[15]  P. Olver Nonlinear Systems , 2013 .

[16]  Derong Liu,et al.  Adaptive Dynamic Programming for Control: Algorithms and Stability , 2012 .

[17]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[18]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming (RLADP)Â -Â Foundations, Common Misconceptions, and the Challenges Ahead , 2013 .

[19]  Sarangapani Jagannathan,et al.  Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence , 2009, Neural Networks.

[20]  W. Rudin Principles of mathematical analysis , 1964 .

[21]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[22]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[23]  Robert F. Stengel,et al.  Online Adaptive Critic Flight Control , 2004 .

[24]  Ali Heydari,et al.  Fixed-final-time optimal control of nonlinear systems with terminal constraints , 2013, Neural Networks.

[25]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[26]  Hao Xu,et al.  Optimal control of uncertain quantized linear discrete‐time systems , 2015 .

[27]  Paul M. Goldbart,et al.  Mathematics for Physics: A Guided Tour for Graduate Students , 2009 .

[28]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[29]  Huaguang Zhang,et al.  Finite horizon optimal control of non-linear discrete-time switched systems using adaptive dynamic programming with ε-error bound , 2014, Int. J. Syst. Sci..

[30]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[31]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[32]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Srinivas R. Vadali,et al.  Optimal finite-time feedback controllers for nonlinear systems with terminal constraints , 2006 .

[35]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[37]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[38]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .