论文信息 - Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis

Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis

In this paper, a novel discrete-time deterministic <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula>-learning algorithm is developed. In each iteration of the developed <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula>-learning algorithm, the iterative <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula> function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula>-learning algorithm. A new convergence criterion is established to guarantee that the iterative <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula> function converges to the optimum, where the convergence criterion of the learning rates for traditional <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula>-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula> function are analyzed to obtain the convergence criterion, instead of analyzing the iterative <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula> function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula>-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula> function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic <inline-formula> <tex-math notation="LaTeX">$ Q$ </tex-math></inline-formula>-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

[1] Kyriakos G. Vamvoudakis,et al. Asymptotically Stable Adaptive–Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[2] Frank L. Lewis,et al. Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI , 2010, Autom..

[3] Huaguang Zhang,et al. Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4] Dimitri P. Bertsekas,et al. Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5] Huaguang Zhang,et al. Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[6] Habib Rajabi Mashhadi,et al. An Adaptive $Q$-Learning Algorithm Developed for Agent-Based Computational Modeling of Electricity Market , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7] Zhong-Ping Jiang,et al. Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[8] Jay H. Lee,et al. Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..

[9] Shalabh Bhatnagar,et al. Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[10] Xiangnan Zhong,et al. An Event-Triggered ADP Control Approach for Continuous-Time System With Unknown Internal States. , 2017 .

[11] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[12] Derong Liu,et al. Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[13] Frank L. Lewis,et al. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[14] Frank L. Lewis,et al. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[15] Bo Lincoln,et al. Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[16] Haibo He,et al. A Novel Energy Function-Based Stability Evaluation and Nonlinear Control Approach for Energy Internet , 2017, IEEE Transactions on Smart Grid.

[17] Jianwei Zhang,et al. A Survey on CPG-Inspired Control Models and System Implementation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[18] Haibo He,et al. Goal Representation Heuristic Dynamic Programming on Maze Navigation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[19] Derong Liu,et al. Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[20] Ali Heydari,et al. Revisiting Approximate Dynamic Programming and its Convergence , 2014, IEEE Transactions on Cybernetics.

[21] Frank L. Lewis,et al. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[22] Derong Liu,et al. Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[23] Li Ren,et al. A Multiagent Q-Learning-Based Optimal Allocation Approach for Urban Water Resource Management System , 2014, IEEE Transactions on Automation Science and Engineering.

[24] Shaocheng Tong,et al. A Unified Approach to Adaptive Neural Control for Nonlinear Discrete-Time Systems With Nonlinear Dead-Zone Input , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[25] Derong Liu,et al. A self-learning scheme for residential energy system control and management , 2013, Neural Computing and Applications.

[26] Frank L. Lewis,et al. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[27] Frank L. Lewis,et al. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[28] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[29] Sarangapani Jagannathan,et al. Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[30] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[31] Haibo He,et al. GrDHP: A General Utility Function Representation for Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[32] Derong Liu,et al. Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33] Huaguang Zhang,et al. Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach , 2015, IEEE Transactions on Cybernetics.

[34] F. Lewis,et al. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[35] Amit Konar,et al. A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[36] Frank L. Lewis,et al. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37] Derong Liu,et al. Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[38] Hao Xu,et al. Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses , 2012, Autom..

[39] Shaocheng Tong,et al. Reinforcement Learning Design-Based Adaptive Tracking Control With Less Learning Parameters for Nonlinear Discrete-Time MIMO Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[40] Josep M. Guerrero,et al. Hybrid Three-Phase/Single-Phase Microgrid Architecture With Power Management Capabilities , 2015, IEEE Transactions on Power Electronics.

[41] Qinglai Wei,et al. Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[42] Derong Liu,et al. A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[43] Qinglai Wei,et al. A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[44] Derong Liu,et al. Model-Free Adaptive Dynamic Programming for Optimal Control of Discrete-Time Ane Nonlinear System , 2014 .

[45] Huaguang Zhang,et al. A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[46] Paul J. Webros. A menu of designs for reinforcement learning over time , 1990 .

[47] Shaocheng Tong,et al. Adaptive NN Tracking Control of Uncertain Nonlinear Discrete-Time Systems With Nonaffine Dead-Zone Input , 2015, IEEE Transactions on Cybernetics.

[48] Hao Xu,et al. Finite-horizon near optimal adaptive control of uncertain linear discrete-time systems , 2015 .

[49] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[50] Derong Liu,et al. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[51] Frank L. Lewis,et al. Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[52] Derong Liu,et al. Multibattery Optimal Coordination Control for Home Energy Management Systems via Distributed Iterative Adaptive Dynamic Programming , 2015, IEEE Transactions on Industrial Electronics.

[53] Josep M. Guerrero,et al. A Multiagent-Based Consensus Algorithm for Distributed Coordinated Control of Distributed Generators in the Energy Internet , 2015, IEEE Transactions on Smart Grid.

[54] H. Vincent Poor,et al. QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[55] Frank L. Lewis,et al. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[56] Frank L. Lewis,et al. Stochastic Optimal Design for Unknown Linear Discrete‐Time System Zero‐Sum Games in Input‐Output form Under Communication Constraints , 2014 .

[57] Zhong-Ping Jiang,et al. Adaptive dynamic programming and optimal control of nonlinear nonaffine systems , 2014, Autom..

[58] Frank L. Lewis,et al. Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.