论文信息 - Data-driven model reference control of MIMO vertical tank systems with model-free VRFT and Q-Learning.

Data-driven model reference control of MIMO vertical tank systems with model-free VRFT and Q-Learning.

This paper proposes a combined Virtual Reference Feedback Tuning-Q-learning model-free control approach, which tunes nonlinear static state feedback controllers to achieve output model reference tracking in an optimal control framework. The novel iterative Batch Fitted Q-learning strategy uses two neural networks to represent the value function (critic) and the controller (actor), and it is referred to as a mixed Virtual Reference Feedback Tuning-Batch Fitted Q-learning approach. Learning convergence of the Q-learning schemes generally depends, among other settings, on the efficient exploration of the state-action space. Handcrafting test signals for efficient exploration is difficult even for input-output stable unknown processes. Virtual Reference Feedback Tuning can ensure an initial stabilizing controller to be learned from few input-output data and it can be next used to collect substantially more input-state data in a controlled mode, in a constrained environment, by compensating the process dynamics. This data is used to learn significantly superior nonlinear state feedback neural networks controllers for model reference tracking, using the proposed Batch Fitted Q-learning iterative tuning strategy, motivating the original combination of the two techniques. The mixed Virtual Reference Feedback Tuning-Batch Fitted Q-learning approach is experimentally validated for water level control of a multi input-multi output nonlinear constrained coupled two-tank system. Discussions on the observed control behavior are offered.

[1] Steven X. Ding,et al. Data-driven design of two-degree-of-freedom controllers using reinforcement learning techniques , 2015 .

[2] M. Aoun,et al. Model-free adaptive fractional order control of stable linear time-varying systems. , 2017, ISA transactions.

[3] Ulf Jeppsson,et al. Application of multivariate virtual reference feedback tuning for wastewater treatment plant control , 2012 .

[4] Cédric Join,et al. Model-free control , 2013, Int. J. Control.

[5] F. Lewis,et al. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[6] Radu-Emil Precup,et al. Three-level hierarchical model-free learning approach to trajectory tracking control , 2016, Eng. Appl. Artif. Intell..

[7] A. Alique,et al. Embedded fuzzy-control system for machining processes: Results of a case study , 2003, Comput. Ind..

[8] Sašo Blaič. A novel trajectory-tracking control law for wheeled mobile robots , 2011 .

[9] Haibo He,et al. Model-Free Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10] Diego Eckhard,et al. Virtual Reference Feedback Tuning for non minimum phase plants , 2009, 2009 European Control Conference (ECC).

[11] Derong Liu,et al. A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints , 2011, Neural Computing and Applications.

[12] Frank L. Lewis,et al. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[13] Simone Formentin,et al. Direct multivariable controller tuning for internal combustion engine test benches , 2014 .

[14] Ding Wang,et al. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey , 2015, International Journal of Automation and Computing.

[15] Sergio M. Savaresi,et al. Virtual reference feedback tuning: a direct method for the design of feedback controllers , 2002, Autom..

[16] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[17] Diego Eckhard,et al. Unbiased MIMO VRFT with application to process control , 2016 .

[18] Haitao Li,et al. Implementation of a MFAC based position sensorless drive for high speed BLDC motors with nonideal back EMF. , 2017, ISA transactions.

[19] Guang-Hong Yang,et al. Data-based fault-tolerant control for affine nonlinear systems with actuator faults. , 2016, ISA transactions.

[20] Shangtai Jin,et al. Data-Driven Model-Free Adaptive Control for a Class of MIMO Nonlinear Discrete-Time Systems , 2011, IEEE Transactions on Neural Networks.

[21] Sergio M. Savaresi,et al. Direct nonlinear control design: the virtual reference feedback tuning (VRFT) approach , 2006, IEEE Transactions on Automatic Control.

[22] Derong Liu,et al. Data-Based Controllability and Observability Analysis of Linear Discrete-Time Systems , 2011, IEEE Transactions on Neural Networks.

[23] Yi Jiang,et al. A Data-Driven Iterative Decoupling Feedforward Control Strategy With Application to an Ultraprecision Motion Stage , 2015, IEEE Transactions on Industrial Electronics.

[24] A. Karimi,et al. Data‐driven model reference control with asymptotically guaranteed stability , 2011 .

[25] Frank L. Lewis,et al. Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26] Hamidreza Jafarnejadsani,et al. Adaptive Control of a Variable-Speed Variable-Pitch Wind Turbine Using Radial-Basis Function Neural Network , 2013, IEEE Transactions on Control Systems Technology.

[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[28] Bart De Schutter,et al. Approximate reinforcement learning: An overview , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[29] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[30] Pengfei Yan,et al. Data-driven controller design for general MIMO nonlinear systems via virtual reference feedback tuning and neural networks , 2016, Neurocomputing.

[31] Donghua Zhou,et al. Data-Based Predictive Control for Networked Nonlinear Systems With Network-Induced Delay and Packet Dropout , 2016, IEEE Transactions on Industrial Electronics.

[32] Huaguang Zhang,et al. Model-free optimal controller design for continuous-time nonlinear systems by adaptive dynamic programming based on a precompensator. , 2015, ISA transactions.

[33] Sergio M. Savaresi,et al. Non-iterative direct data-driven controller tuning for multivariable systems: theory and application , 2012 .

[34] Derong Liu,et al. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35] Zhuo Wang,et al. From model-based control to data-driven control: Survey, classification and perspective , 2013, Inf. Sci..

[36] Sarangapani Jagannathan,et al. Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[37] Derong Liu,et al. Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39] Radu-Emil Precup,et al. An overview on fault diagnosis and nature-inspired optimal control of industrial process applications , 2015, Comput. Ind..

[40] Zhen Ni,et al. Experimental Studies on Data-Driven Heuristic Dynamic Programming for POMDP , 2014 .

[41] Radu-Emil Precup,et al. Model-Free Primitive-Based Iterative Learning Control Approach to Trajectory Tracking of MIMO Systems With Experimental Validation , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[42] Radu-Emil Precup,et al. Optimal behaviour prediction using a primitive-based data-driven model-free iterative learning control approach , 2015, Comput. Ind..

[43] Rajneesh Sharma,et al. Fuzzy Lyapunov Reinforcement Learning for Non Linear Systems. , 2017, ISA transactions.

[44] Radu-Emil Precup,et al. Model-Free control performance improvement using virtual reference feedback tuning and reinforcement Q-learning , 2017, Int. J. Syst. Sci..

[45] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[46] Ján Vascák,et al. Adaptation of fuzzy cognitive maps by migration algorithms , 2012, Kybernetes.

[47] Tassiano Neuhaus,et al. Tuning Nonlinear Controllers with the Virtual Reference Approach , 2014 .

[48] Sergio M. Savaresi,et al. Data-driven control design for neuroprotheses: a virtual reference feedback tuning (VRFT) approach , 2004, IEEE Transactions on Control Systems Technology.

[49] M. Nakamoto. An application of the virtual reference feedback tuning for an MIMO process , 2004, SICE 2004 Annual Conference.

[50] Radu-Emil Precup,et al. Data-driven model-free slip control of anti-lock braking systems using reinforcement Q-learning , 2018, Neurocomputing.

[51] R. Longchamp,et al. Iterative Learning Control based on Stochastic Approximation , 2008 .

[52] Derong Liu,et al. Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning , 2014, Neural Networks.

[53] Ali Heydari,et al. Revisiting Approximate Dynamic Programming and its Convergence , 2014, IEEE Transactions on Cybernetics.

[54] Bart De Schutter,et al. Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning , 2017, IEEE Transactions on Smart Grid.

[55] Derong Liu,et al. Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[56] Qinglai Wei,et al. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[57] Pedro Albertos,et al. Neural networks in virtual reference tuning , 2011, Eng. Appl. Artif. Intell..

[58] Jianbin Qiu,et al. Real-Time Fault Detection Approach for Nonlinear Systems and its Asynchronous T–S Fuzzy Observer-Based Implementation , 2017, IEEE Transactions on Cybernetics.

[59] J. Spall,et al. Model-free control of nonlinear stochastic systems with discrete-time measurements , 1998, IEEE Trans. Autom. Control..

[60] Håkan Hjalmarsson,et al. Iterative feedback tuning—an overview , 2002 .

[61] Radu-Emil Precup,et al. Data-based two-degree-of-freedom iterative control approach to constrained non-linear systems , 2015 .

[62] Radu-Emil Precup,et al. Nature-inspired optimal tuning of input membership functions of Takagi-Sugeno-Kang fuzzy models for Anti-lock Braking Systems , 2015, Appl. Soft Comput..

[63] Radu-Emil Precup,et al. Multi-input–multi-output system experimental validation of model-free control and virtual reference feedback tuning techniques , 2016 .

[64] Huaguang Zhang,et al. Nearly data-based optimal control for linear discrete model-free systems with delays via reinforcement learning , 2016, Int. J. Syst. Sci..