Data-driven model reference control of MIMO vertical tank systems with model-free VRFT and Q-Learning.

This paper proposes a combined Virtual Reference Feedback Tuning-Q-learning model-free control approach, which tunes nonlinear static state feedback controllers to achieve output model reference tracking in an optimal control framework. The novel iterative Batch Fitted Q-learning strategy uses two neural networks to represent the value function (critic) and the controller (actor), and it is referred to as a mixed Virtual Reference Feedback Tuning-Batch Fitted Q-learning approach. Learning convergence of the Q-learning schemes generally depends, among other settings, on the efficient exploration of the state-action space. Handcrafting test signals for efficient exploration is difficult even for input-output stable unknown processes. Virtual Reference Feedback Tuning can ensure an initial stabilizing controller to be learned from few input-output data and it can be next used to collect substantially more input-state data in a controlled mode, in a constrained environment, by compensating the process dynamics. This data is used to learn significantly superior nonlinear state feedback neural networks controllers for model reference tracking, using the proposed Batch Fitted Q-learning iterative tuning strategy, motivating the original combination of the two techniques. The mixed Virtual Reference Feedback Tuning-Batch Fitted Q-learning approach is experimentally validated for water level control of a multi input-multi output nonlinear constrained coupled two-tank system. Discussions on the observed control behavior are offered.

[1]  Steven X. Ding,et al.  Data-driven design of two-degree-of-freedom controllers using reinforcement learning techniques , 2015 .

[2]  M. Aoun,et al.  Model-free adaptive fractional order control of stable linear time-varying systems. , 2017, ISA transactions.

[3]  Ulf Jeppsson,et al.  Application of multivariate virtual reference feedback tuning for wastewater treatment plant control , 2012 .

[4]  Cédric Join,et al.  Model-free control , 2013, Int. J. Control.

[5]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[6]  Radu-Emil Precup,et al.  Three-level hierarchical model-free learning approach to trajectory tracking control , 2016, Eng. Appl. Artif. Intell..

[7]  A. Alique,et al.  Embedded fuzzy-control system for machining processes: Results of a case study , 2003, Comput. Ind..

[8]  Sašo Blaič A novel trajectory-tracking control law for wheeled mobile robots , 2011 .

[9]  Haibo He,et al.  Model-Free Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Diego Eckhard,et al.  Virtual Reference Feedback Tuning for non minimum phase plants , 2009, 2009 European Control Conference (ECC).

[11]  Derong Liu,et al.  A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints , 2011, Neural Computing and Applications.

[12]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[13]  Simone Formentin,et al.  Direct multivariable controller tuning for internal combustion engine test benches , 2014 .

[14]  Ding Wang,et al.  Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey , 2015, International Journal of Automation and Computing.

[15]  Sergio M. Savaresi,et al.  Virtual reference feedback tuning: a direct method for the design of feedback controllers , 2002, Autom..

[16]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[17]  Diego Eckhard,et al.  Unbiased MIMO VRFT with application to process control , 2016 .

[18]  Haitao Li,et al.  Implementation of a MFAC based position sensorless drive for high speed BLDC motors with nonideal back EMF. , 2017, ISA transactions.

[19]  Guang-Hong Yang,et al.  Data-based fault-tolerant control for affine nonlinear systems with actuator faults. , 2016, ISA transactions.

[20]  Shangtai Jin,et al.  Data-Driven Model-Free Adaptive Control for a Class of MIMO Nonlinear Discrete-Time Systems , 2011, IEEE Transactions on Neural Networks.

[21]  Sergio M. Savaresi,et al.  Direct nonlinear control design: the virtual reference feedback tuning (VRFT) approach , 2006, IEEE Transactions on Automatic Control.

[22]  Derong Liu,et al.  Data-Based Controllability and Observability Analysis of Linear Discrete-Time Systems , 2011, IEEE Transactions on Neural Networks.

[23]  Yi Jiang,et al.  A Data-Driven Iterative Decoupling Feedforward Control Strategy With Application to an Ultraprecision Motion Stage , 2015, IEEE Transactions on Industrial Electronics.

[24]  A. Karimi,et al.  Data‐driven model reference control with asymptotically guaranteed stability , 2011 .

[25]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Hamidreza Jafarnejadsani,et al.  Adaptive Control of a Variable-Speed Variable-Pitch Wind Turbine Using Radial-Basis Function Neural Network , 2013, IEEE Transactions on Control Systems Technology.

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Bart De Schutter,et al.  Approximate reinforcement learning: An overview , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[29]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[30]  Pengfei Yan,et al.  Data-driven controller design for general MIMO nonlinear systems via virtual reference feedback tuning and neural networks , 2016, Neurocomputing.

[31]  Donghua Zhou,et al.  Data-Based Predictive Control for Networked Nonlinear Systems With Network-Induced Delay and Packet Dropout , 2016, IEEE Transactions on Industrial Electronics.

[32]  Huaguang Zhang,et al.  Model-free optimal controller design for continuous-time nonlinear systems by adaptive dynamic programming based on a precompensator. , 2015, ISA transactions.

[33]  Sergio M. Savaresi,et al.  Non-iterative direct data-driven controller tuning for multivariable systems: theory and application , 2012 .

[34]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Zhuo Wang,et al.  From model-based control to data-driven control: Survey, classification and perspective , 2013, Inf. Sci..

[36]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Derong Liu,et al.  Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  Radu-Emil Precup,et al.  An overview on fault diagnosis and nature-inspired optimal control of industrial process applications , 2015, Comput. Ind..

[40]  Zhen Ni,et al.  Experimental Studies on Data-Driven Heuristic Dynamic Programming for POMDP , 2014 .

[41]  Radu-Emil Precup,et al.  Model-Free Primitive-Based Iterative Learning Control Approach to Trajectory Tracking of MIMO Systems With Experimental Validation , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Radu-Emil Precup,et al.  Optimal behaviour prediction using a primitive-based data-driven model-free iterative learning control approach , 2015, Comput. Ind..

[43]  Rajneesh Sharma,et al.  Fuzzy Lyapunov Reinforcement Learning for Non Linear Systems. , 2017, ISA transactions.

[44]  Radu-Emil Precup,et al.  Model-Free control performance improvement using virtual reference feedback tuning and reinforcement Q-learning , 2017, Int. J. Syst. Sci..

[45]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[46]  Ján Vascák,et al.  Adaptation of fuzzy cognitive maps by migration algorithms , 2012, Kybernetes.

[47]  Tassiano Neuhaus,et al.  Tuning Nonlinear Controllers with the Virtual Reference Approach , 2014 .

[48]  Sergio M. Savaresi,et al.  Data-driven control design for neuroprotheses: a virtual reference feedback tuning (VRFT) approach , 2004, IEEE Transactions on Control Systems Technology.

[49]  M. Nakamoto An application of the virtual reference feedback tuning for an MIMO process , 2004, SICE 2004 Annual Conference.

[50]  Radu-Emil Precup,et al.  Data-driven model-free slip control of anti-lock braking systems using reinforcement Q-learning , 2018, Neurocomputing.

[51]  R. Longchamp,et al.  Iterative Learning Control based on Stochastic Approximation , 2008 .

[52]  Derong Liu,et al.  Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning , 2014, Neural Networks.

[53]  Ali Heydari,et al.  Revisiting Approximate Dynamic Programming and its Convergence , 2014, IEEE Transactions on Cybernetics.

[54]  Bart De Schutter,et al.  Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning , 2017, IEEE Transactions on Smart Grid.

[55]  Derong Liu,et al.  Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[57]  Pedro Albertos,et al.  Neural networks in virtual reference tuning , 2011, Eng. Appl. Artif. Intell..

[58]  Jianbin Qiu,et al.  Real-Time Fault Detection Approach for Nonlinear Systems and its Asynchronous T–S Fuzzy Observer-Based Implementation , 2017, IEEE Transactions on Cybernetics.

[59]  J. Spall,et al.  Model-free control of nonlinear stochastic systems with discrete-time measurements , 1998, IEEE Trans. Autom. Control..

[60]  Håkan Hjalmarsson,et al.  Iterative feedback tuning—an overview , 2002 .

[61]  Radu-Emil Precup,et al.  Data-based two-degree-of-freedom iterative control approach to constrained non-linear systems , 2015 .

[62]  Radu-Emil Precup,et al.  Nature-inspired optimal tuning of input membership functions of Takagi-Sugeno-Kang fuzzy models for Anti-lock Braking Systems , 2015, Appl. Soft Comput..

[63]  Radu-Emil Precup,et al.  Multi-input–multi-output system experimental validation of model-free control and virtual reference feedback tuning techniques , 2016 .

[64]  Huaguang Zhang,et al.  Nearly data-based optimal control for linear discrete model-free systems with delays via reinforcement learning , 2016, Int. J. Syst. Sci..