General value iteration based single network approach for constrained optimal controller design of partially-unknown continuous-time nonlinear systems

Abstract In this paper, a novel iterative approximate dynamic programming scheme is proposed by introducing the learning mechanism of value iteration (VI) to solve the constrained optimal control problem for CT affine nonlinear systems with utilizing only one neural network. The idea is to show the feasibility of introducing the VI learning mechanism to solve for the constrained optimal control problem from a theoretical point of view, and thus the initial admissible control can be avoided compared with most existing works based on policy iteration (PI). Meanwhile, the initial condition of the proposed VI based method can be more general than the traditional VI method which requires the initial value function to be a zero function. A general analytical method is proposed to demonstrate the convergence property. To simplify the architecture, only one critic neural network is adopted to approximate the iterative value function while implementing the proposed method. At last, two simulation examples are proposed to validate the theoretical results.

[1]  Huaguang Zhang,et al.  Fault-Tolerant Controller Design for a Class of Nonlinear MIMO Discrete-Time Systems via Online Reinforcement Learning Algorithm , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[3]  Derong Liu,et al.  An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs , 2013, Inf. Sci..

[4]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[5]  Frank L. Lewis,et al.  Optimized Assistive Human–Robot Interaction Using Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[6]  Haibo He,et al.  GrDHP: A General Utility Function Representation for Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Derong Liu,et al.  Adaptive Dynamic Programming for Control , 2012 .

[8]  Derong Liu,et al.  Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique , 2013, Neurocomputing.

[9]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[10]  Huaguang Zhang,et al.  Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming , 2010, Neurocomputing.

[11]  Yanhong Luo,et al.  Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics , 2016 .

[12]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[13]  Warren E. Dixon,et al.  Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[14]  Derong Liu,et al.  An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state , 2012, Neural Networks.

[15]  Derong Liu,et al.  Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints , 2013 .

[16]  Derong Liu,et al.  Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .

[17]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[18]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Derong Liu,et al.  Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning , 2016, Inf. Sci..

[21]  Derong Liu,et al.  Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[22]  Jun Morimoto,et al.  Model-based reinforcement learning with dimension reduction , 2016, Neural Networks.

[23]  Jacek M. Zurada,et al.  Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[25]  Kenji Doya,et al.  From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning , 2016, Neural Networks.

[26]  Sarangapani Jagannathan,et al.  Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence , 2009, Neural Networks.

[27]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[28]  Zhong-Ping Jiang,et al.  Approximate Dynamic Programming for Optimal Stationary Control With Control-Dependent Noise , 2011, IEEE Transactions on Neural Networks.

[29]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[30]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[31]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[32]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[33]  F. Lewis,et al.  A policy iteration approach to online optimal control of continuous-time constrained-input systems. , 2013, ISA transactions.

[34]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Haibo He,et al.  A Theoretical Foundation of Goal Representation Heuristic Dynamic Programming , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[38]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[39]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[40]  Xiong Yang,et al.  Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems , 2016, Inf. Sci..

[41]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Frank L. Lewis,et al.  Fixed-Final-Time-Constrained Optimal Control of Nonlinear Systems Using Neural Network HJB Approach , 2007, IEEE Transactions on Neural Networks.

[43]  Frank L. Lewis,et al.  Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning , 2016, Autom..

[44]  Huaguang Zhang,et al.  Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming , 2015, Neurocomputing.

[45]  Warren E. Dixon,et al.  Model-based reinforcement learning for approximate optimal regulation , 2016, Autom..

[46]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[47]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[49]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[50]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.