Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.

[1]  Frank L. Lewis,et al.  Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes , 2018, IEEE Transactions on Industrial Electronics.

[2]  Kyriakos G. Vamvoudakis,et al.  Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach , 2017, Syst. Control. Lett..

[3]  Frank L. Lewis,et al.  Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Frank L. Lewis,et al.  Data-Driven Flotation Industrial Process Operational Optimal Control Based on Reinforcement Learning , 2018, IEEE Transactions on Industrial Informatics.

[8]  Haibo He,et al.  Novel iterative neural dynamic programming for data-based approximate optimal control design , 2017, Autom..

[9]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[10]  Frank L. Lewis,et al.  Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI , 2010, Autom..

[11]  Pedro Ferreira,et al.  An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  Roberto A. Santiago,et al.  Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.

[13]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[14]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[15]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[16]  Frank L. Lewis,et al.  Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[17]  Frank L. Lewis,et al.  Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes , 2017, IEEE Transactions on Cybernetics.

[18]  Qing-Long Han,et al.  Network-Based T–S Fuzzy Dynamic Positioning Controller Design for Unmanned Marine Vehicles , 2018, IEEE Transactions on Cybernetics.

[19]  Yun Zhang,et al.  Neural Network Learning and Robust Stabilization of Nonlinear Systems With Dynamic Uncertainties , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Tianyou Chai,et al.  Hybrid intelligent control for optimal operation of shaft furnace roasting process , 2011 .

[21]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[22]  Haibo He,et al.  Adaptive Critic Nonlinear Robust Control: A Survey , 2017, IEEE Transactions on Cybernetics.

[23]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[24]  K. Vamvoudakis Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamics , 2017 .

[25]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[26]  Haibo He,et al.  Event-Driven Nonlinear Discounted Optimal Regulation Involving a Power System Application , 2017, IEEE Transactions on Industrial Electronics.

[27]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[28]  Tianyou Chai,et al.  Optimal operational control for complex industrial processes , 2014, Annu. Rev. Control..

[29]  Frank L. Lewis,et al.  Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[31]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[32]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[33]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[34]  Qinmin Yang,et al.  Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.