论文信息 - Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration

Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration

In this paper, we present error bound analysis of the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function for the action-dependent adaptive dynamic programming for solving discounted optimal control problems of unknown discrete-time nonlinear systems. The convergence of <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-functions derived by a policy iteration algorithm under ideal conditions is given. Considering the approximated errors of the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function and control policy in the policy evaluation step and policy improvement step, we establish error bounds of approximate <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-functions in each iteration. With the given boundedness conditions, the approximate <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function will converge to a finite neighborhood of the optimal <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function. To implement the presented algorithm, two three-layer neural networks are employed to approximate the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-function and the control policy, respectively. Finally, a simulation example is utilized to verify the validity of the presented algorithm.

[1] Amit Konar,et al. A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2] Frank L. Lewis,et al. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3] Derong Liu,et al. Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.

[5] Qinglai Wei,et al. A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[6] Derong Liu,et al. Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems , 2014, IEEE Transactions on Cybernetics.

[7] Derong Liu,et al. Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[8] Derong Liu,et al. Adaptive Dynamic Programming for Control: Algorithms and Stability , 2012 .

[9] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..

[10] Huaguang Zhang,et al. Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[11] Bin Wang,et al. A supervised Actor–Critic approach for adaptive cruise control , 2013, Soft Comput..

[12] Qinglai Wei,et al. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[13] Derong Liu,et al. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[14] Frank L. Lewis,et al. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[15] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[16] Paul J. Werbos,et al. Approximate dynamic programming for real-time control and neural modeling , 1992 .

[17] Han-Xiong Li,et al. Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18] Dimitri P. Bertsekasy. Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications , 2012 .

[19] Derong Liu,et al. Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[20] Frank L. Lewis,et al. 2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[21] Jean-Jacques E. Slotine,et al. Neural Network Control of Unknown Nonlinear Systems , 1989, 1989 American Control Conference.

[22] Derong Liu,et al. Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[23] Derong Liu,et al. Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[24] Derong Liu,et al. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming , 2014, Inf. Sci..

[25] Tingwen Huang,et al. Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[26] Tingwen Huang,et al. Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[27] Zhao Dongbin,et al. Adaptive optimal control for the uncertain driving habit problem in adaptive cruise control system , 2013, ICVES.

[28] Derong Liu,et al. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems , 2015, Science China Information Sciences.

[29] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..

[30] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.

[31] Stephen D. Patek,et al. Partially Observed Stochastic Shortest Path Problems With Approximate Solution by Neurodynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[32] Derong Liu,et al. Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[33] Xiong Yang,et al. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[34] Dewen Hu,et al. Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[35] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[36] Jing Xu,et al. DHP Method for Ramp Metering of Freeway Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[37] Derong Liu,et al. Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[38] Derong Liu,et al. Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique , 2013, Neurocomputing.

[39] A. Rantzer. Relaxed dynamic programming in switching systems , 2006 .

[40] Jennie Si,et al. Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[41] Huaguang Zhang,et al. Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[42] Changyin Sun,et al. A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture , 2014, Science China Information Sciences.

[43] Sarangapani Jagannathan,et al. Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[44] Tingwen Huang,et al. Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[45] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46] Pedro Ferreira,et al. An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[47] Derong Liu,et al. Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming , 2014 .

[48] Gary G. Yen,et al. Improving the performance of globalized dual heuristic programming for fault tolerant control through an online learning supervisor , 2005, IEEE Transactions on Automation Science and Engineering.

[49] Derong Liu,et al. Model-Free Adaptive Dynamic Programming for Optimal Control of Discrete-Time Ane Nonlinear System , 2014 .

[50] Derong Liu,et al. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[51] Derong Liu,et al. Policy Iteration Algorithm for Online Design of Robust Control for a Class of Continuous-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[52] Dongbin Zhao,et al. Full-range adaptive cruise control based on supervised adaptive dynamic programming , 2014, Neurocomputing.

[53] Huaguang Zhang,et al. Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[54] Derong Liu,et al. Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .

[55] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.

[56] Tingwen Huang,et al. Data-Driven $H_\infty$ Control for Nonlinear Distributed Parameter Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[57] Derong Liu,et al. A self-learning scheme for residential energy system control and management , 2013, Neural Computing and Applications.

[58] Derong Liu,et al. Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[59] Lars Grüne,et al. On the Infinite Horizon Performance of Receding Horizon Controllers , 2008, IEEE Transactions on Automatic Control.