A Novel Exploration-Exploitation-Based Adaptive Law for Intelligent Model-Free Control Approaches

Model-free control approaches require advanced exploration-exploitation policies to achieve practical tasks such as learning to bipedal robot walk in unstructured environments. In this article, we first construct a comprehensive exploration-exploitation policy that carries quality knowledge about the long-term predictor and the control policy, and the control signal of the model-free algorithms. Therefore, the developed model-free algorithm continues exploration by adjusting its unknown parameters until the desired learning and control are accomplished. Second, we provide an utterly model-free adaptive law enriched with the exploration-exploitation policy and derived step-by-step using the exact analogy of the model-based solution. The obtained adaptive control law considers the control signal saturation and the control signal (input) delay. Performed Lyapunov stability analysis ensures the convergence of the adaptive law that can also be used for intelligent control approaches. Third, we implement the adaptive algorithm in real time on a challenging benchmark system: a fourth-order, coupled dynamics, input saturated, and time-delayed underactuated manipulator. The results show that the proposed adaptive algorithm explores larger state-action spaces and treats the vanishing gradient problem in both learning and control. Also, we notice from the results that the learning and control properties of the adaptive algorithm are optimized as required.

[1]  Honggui Han,et al.  Cooperative Fuzzy-Neural Control for Wastewater Treatment Process , 2021, IEEE Transactions on Industrial Informatics.

[2]  Rong Su,et al.  Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With an Iterative Single Critic Learning Framework , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Hamid Reza Karimi,et al.  Adaptive-Critic Design for Decentralized Event-Triggered Control of Constrained Nonlinear Interconnected Systems Within an Identifier-Critic Framework , 2021, IEEE Transactions on Cybernetics.

[4]  Bin Jiang,et al.  Model-Free Cooperative Adaptive Sliding-Mode-Constrained-Control for Multiple Linear Induction Traction Systems , 2020, IEEE Transactions on Cybernetics.

[5]  Guang-Hong Yang,et al.  Adaptive Fault Estimation for T–S Fuzzy Interconnected Systems Based on Persistent Excitation Condition via Reference Signals , 2019, IEEE Transactions on Cybernetics.

[6]  Marc Toussaint,et al.  Planning Approximate Exploration Trajectories for Model-Free Reinforcement Learning in Contact-Rich Manipulation , 2019, IEEE Robotics and Automation Letters.

[7]  Xin Huang,et al.  Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning , 2019, ArXiv.

[8]  Zhidong Deng,et al.  A Novel Dual Successive Projection-Based Model-Free Adaptive Control Method and Application to an Autonomous Car , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Thaleia Zariphopoulou,et al.  Exploration versus Exploitation in Reinforcement Learning: A Stochastic Control Approach , 2018, SSRN Electronic Journal.

[10]  Ali Emadi,et al.  Optimization-Based Position Sensorless Finite Control Set Model Predictive Control for IPMSMs , 2018, IEEE Transactions on Power Electronics.

[11]  Yasuo Kuniyoshi,et al.  Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning , 2018, ArXiv.

[12]  Lei Cao,et al.  Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning , 2018, IEICE Trans. Inf. Syst..

[13]  Jean-Philippe Condomines,et al.  Model-Free Control Approach for Fixed-Wing UAVs with Uncertain Parameters Analysis , 2018, 2018 23rd International Conference on Methods & Models in Automation & Robotics (MMAR).

[14]  Frans A. Oliehoek,et al.  Efficient Exploitation of Factored Domains in Bayesian Reinforcement Learning for POMDPs , 2018 .

[15]  Onder Tutsoy,et al.  Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay. , 2018, ISA transactions.

[16]  Benjamin Van Roy,et al.  Scalable Coordinated Exploration in Concurrent Reinforcement Learning , 2018, NeurIPS.

[17]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[18]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[19]  Cong Wang,et al.  Relationship Between Persistent Excitation Levels and RBF Network Structures, With Application to Performance Analysis of Deterministic Learning , 2017, IEEE Transactions on Cybernetics.

[20]  Ali Emadi,et al.  Online multi-parameter estimation of interior permanent magnet motor drives with finite control set model predictive control , 2017 .

[21]  Yang Li,et al.  Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[22]  Min Wang,et al.  Dynamic Learning From Adaptive Neural Control of Robot Manipulators With Prescribed Performance , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[23]  Zhong-Ping Jiang,et al.  Model-free robust optimal feedback mechanisms of biological motor control , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[24]  Ahmad Azlan Mat Isa,et al.  Positioning control of an under-actuated robot manipulator using artificial neural network inversion technique , 2011 .

[25]  Guang-Hong Yang,et al.  Fuzzy robust constrained model predictive control for nonlinear systems , 2011 .

[26]  Daniel Kudenko,et al.  Improving Optimistic Exploration in Model-Free Reinforcement Learning , 2009, ICANNGA.

[27]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[28]  Jemin George,et al.  A robust estimator for stochastic systems under unknown persistent excitation , 2016, Autom..