Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration

In this paper, we present error bound analysis of the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function for the action-dependent adaptive dynamic programming for solving discounted optimal control problems of unknown discrete-time nonlinear systems. The convergence of <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-functions derived by a policy iteration algorithm under ideal conditions is given. Considering the approximated errors of the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function and control policy in the policy evaluation step and policy improvement step, we establish error bounds of approximate <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-functions in each iteration. With the given boundedness conditions, the approximate <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function will converge to a finite neighborhood of the optimal <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function. To implement the presented algorithm, two three-layer neural networks are employed to approximate the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function and the control policy, respectively. Finally, a simulation example is utilized to verify the validity of the presented algorithm.

[1]  Amit Konar,et al.  A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Derong Liu,et al.  Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Rémi Munos,et al.  Error Bounds for Approximate Policy Iteration , 2003, ICML.

[5]  Qinglai Wei,et al.  A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[6]  Derong Liu,et al.  Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems , 2014, IEEE Transactions on Cybernetics.

[7]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[8]  Derong Liu,et al.  Adaptive Dynamic Programming for Control: Algorithms and Stability , 2012 .

[9]  Rémi Munos,et al.  Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..

[10]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[11]  Bin Wang,et al.  A supervised Actor–Critic approach for adaptive cruise control , 2013, Soft Comput..

[12]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[13]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[14]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[15]  Benjamin Van Roy Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[16]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[17]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Dimitri P. Bertsekasy Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications , 2012 .

[19]  Derong Liu,et al.  Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[20]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[21]  Jean-Jacques E. Slotine,et al.  Neural Network Control of Unknown Nonlinear Systems , 1989, 1989 American Control Conference.

[22]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[23]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[24]  Derong Liu,et al.  Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming , 2014, Inf. Sci..

[25]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[26]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[27]  Zhao Dongbin,et al.  Adaptive optimal control for the uncertain driving habit problem in adaptive cruise control system , 2013, ICVES.

[28]  Derong Liu,et al.  A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems , 2015, Science China Information Sciences.

[29]  John N. Tsitsiklis,et al.  On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..

[30]  Doina Precup,et al.  A Convergent Form of Approximate Policy Iteration , 2002, NIPS.

[31]  Stephen D. Patek,et al.  Partially Observed Stochastic Shortest Path Problems With Approximate Solution by Neurodynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[32]  Derong Liu,et al.  Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Xiong Yang,et al.  Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[34]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[35]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[36]  Jing Xu,et al.  DHP Method for Ramp Metering of Freeway Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[37]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[38]  Derong Liu,et al.  Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique , 2013, Neurocomputing.

[39]  A. Rantzer Relaxed dynamic programming in switching systems , 2006 .

[40]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[41]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[42]  Changyin Sun,et al.  A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture , 2014, Science China Information Sciences.

[43]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Pedro Ferreira,et al.  An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[47]  Derong Liu,et al.  Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming , 2014 .

[48]  Gary G. Yen,et al.  Improving the performance of globalized dual heuristic programming for fault tolerant control through an online learning supervisor , 2005, IEEE Transactions on Automation Science and Engineering.

[49]  Derong Liu,et al.  Model-Free Adaptive Dynamic Programming for Optimal Control of Discrete-Time Ane Nonlinear System , 2014 .

[50]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Derong Liu,et al.  Policy Iteration Algorithm for Online Design of Robust Control for a Class of Continuous-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[52]  Dongbin Zhao,et al.  Full-range adaptive cruise control based on supervised adaptive dynamic programming , 2014, Neurocomputing.

[53]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[54]  Derong Liu,et al.  Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .

[55]  Rémi Munos,et al.  Error Bounds for Approximate Value Iteration , 2005, AAAI.

[56]  Tingwen Huang,et al.  Data-Driven $H_\infty$ Control for Nonlinear Distributed Parameter Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Derong Liu,et al.  A self-learning scheme for residential energy system control and management , 2013, Neural Computing and Applications.

[58]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[59]  Lars Grüne,et al.  On the Infinite Horizon Performance of Receding Horizon Controllers , 2008, IEEE Transactions on Automatic Control.