Optimal Tracking Control of Uncertain Systems

Abstract This chapter presents adaptive solutions to the optimal tracking problem of nonlinear discrete-time and continuous-time systems. These methods find the optimal feedback and feedforward parts of the control input simultaneously, without requiring complete knowledge of the system dynamics. First, an augmented system composed of the error system dynamics and the reference trajectory dynamics is formed to introduce a new nonquadratic discounted performance function for the optimal tracking control problem. This encodes the input constrains caused by the actuator saturation into the optimization problem. Then, the tracking Bellman equation and the tracking Hamilton-Jacobi-Bellman equation for both discrete-time and continuous-time systems are derived. Finally, to obviate the requirement of complete knowledge of the system dynamics in finding the Hamilton-Jacobi-Bellman solution, integral reinforcement learning and off-policy reinforcement learning algorithms are developed for continuous-time systems, and a reinforcement learning algorithm on an actor-critic structure is developed for discrete-time systems.

[1]  Brian D. O. Anderson,et al.  A game theoretic algorithm to compute local stabilizing solutions to HJBI equations in nonlinear H∞ control , 2009, Autom..

[2]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  G. Zames Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses , 1981 .

[4]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[5]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6]  D. Kleinman,et al.  An optimal control model of human response part II: Prediction of human performance in a complex task , 1970 .

[7]  Ali Saberi,et al.  Output synchronization for heterogeneous networks of introspective right‐invertible agents , 2014 .

[8]  Huai-Ning Wu,et al.  Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control , 2013, Inf. Sci..

[9]  B. Paden,et al.  Nonlinear inversion-based output tracking , 1996, IEEE Trans. Autom. Control..

[10]  Derong Liu,et al.  Adaptive Dynamic Programming for Control: Algorithms and Stability , 2012 .

[11]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[12]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[13]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[14]  Derong Liu,et al.  Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics , 2014, IEEE Transactions on Automation Science and Engineering.

[15]  Frank L. Lewis,et al.  Online solution of nonlinear two‐player zero‐sum games using synchronous policy iteration , 2012 .

[16]  Paul J. Werbos,et al.  Stable adaptive control using new critic designs , 1998, Other Conferences.

[17]  Jan Lunze,et al.  Synchronization of Heterogeneous Agents , 2012, IEEE Transactions on Automatic Control.

[18]  Shuzhi Sam Ge,et al.  Impedance control for multi-point human-robot interaction , 2011, 2011 8th Asian Control Conference (ASCC).

[19]  Hao Xu,et al.  Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses , 2012, Autom..

[20]  Frank L. Lewis,et al.  Multilayer discrete-time neural-net controller with guaranteed performance , 1996, IEEE Trans. Neural Networks.

[21]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[22]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[23]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[24]  Marios M. Polycarpou,et al.  Stable adaptive neural control scheme for nonlinear systems , 1996, IEEE Trans. Autom. Control..

[25]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[26]  Huaguang Zhang,et al.  Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[27]  R.V. Patel,et al.  A stable neural network observer with application to flexible-joint manipulators , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[28]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[29]  Yanhong Luo,et al.  Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators , 2008 .

[30]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[31]  Yoshiyuki Tanaka,et al.  Tracking control properties of human-robotic systems based on impedance control , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[32]  P. Khargonekar,et al.  State-space solutions to standard H/sub 2/ and H/sub infinity / control problems , 1989 .

[33]  Jacob Engwerda,et al.  LQ Dynamic Optimization and Differential Games , 2005 .

[34]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Xiong Luo,et al.  Stability of direct heuristic dynamic programming for nonlinear tracking control using PID neural network , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[36]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[37]  K. Furuta,et al.  Human adaptive mechatronics (HAM) for haptic system , 2004, 30th Annual Conference of IEEE Industrial Electronics Society, 2004. IECON 2004.

[38]  Zhong-Ping Jiang,et al.  Distributed output regulation of leader–follower multi‐agent systems , 2013 .

[39]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Frank L. Lewis,et al.  Optimal Design for Synchronization of Cooperative Systems: State Feedback, Observer and Output Feedback , 2011, IEEE Transactions on Automatic Control.

[41]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[42]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[43]  Huaguang Zhang,et al.  Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach , 2015, IEEE Transactions on Cybernetics.

[44]  Sarangapani Jagannathan,et al.  Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[45]  Frank Allgöwer,et al.  An internal model principle is necessary and sufficient for linear output synchronization , 2011, Autom..

[46]  F. Lewis,et al.  Online adaptive algorithm for optimal control with integral reinforcement learning , 2014 .

[47]  Sarangapani Jagannathan,et al.  Online optimal control of nonlinear discrete-time systems using approximate dynamic programming , 2011 .

[48]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[49]  B. Finlayson The method of weighted residuals and variational principles : with application in fluid mechanics, heat and mass transfer , 1972 .

[50]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[51]  Arthur J. Krener,et al.  H∞ tracking control for a class of nonlinear systems , 1999, IEEE Trans. Autom. Control..

[52]  Derong Liu,et al.  Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics , 2012, Neural Computing and Applications.

[53]  Sae Franklin,et al.  Visuomotor feedback gains upregulate during the learning of novel dynamics , 2012, Journal of neurophysiology.

[54]  A. Calise,et al.  Convergence of a numerical algorithm for calculating optimal output feedback gains , 1985 .

[55]  Han Ding,et al.  Point-to-Point Motion Control for a High-Acceleration Positioning Table via Cascaded Learning Schemes , 2007, IEEE Transactions on Industrial Electronics.

[56]  Alexander S. Poznyak,et al.  Nonlinear adaptive trajectory tracking using dynamic neural networks , 1999, IEEE Trans. Neural Networks.

[57]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[58]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[59]  Moshe Cohen,et al.  Learning impedance parameters for robot control using an associative search network , 1991, IEEE Trans. Robotics Autom..

[60]  Frank L. Lewis,et al.  Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems , 2008, IEEE Transactions on Neural Networks.

[61]  Derong Liu,et al.  Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm , 2014, Neurocomputing.

[62]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[63]  Petros A. Ioannou,et al.  Neural networks as on-line approximators of nonlinear systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[64]  Herbert Werner Robust control of a laboratory flight simulator by nondynamic multirate output feedback , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[65]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[66]  Le Yi Wang,et al.  State Observability and Observers of Linear-Time-Invariant Systems Under Irregular Sampling and Sensor Limitations , 2011, IEEE Transactions on Automatic Control.

[67]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[68]  Katsuhisa Furuta,et al.  Assisting control in Human Adaptive Mechatronics — single ball juggling — , 2006, 2006 IEEE Conference on Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control.

[69]  Frank L. Lewis,et al.  Neural net robot controller with guaranteed tracking performance , 1995, IEEE Trans. Neural Networks.

[70]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[71]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[72]  A. Schaft L/sub 2/-gain analysis of nonlinear systems and nonlinear state-feedback H/sub infinity / control , 1992 .

[73]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[74]  Shuzhi Sam Ge,et al.  Adaptive Neural Network Control For Smart Materials Robots Using Singular Perturbation Technique , 2008 .

[75]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[76]  Jie Lin,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[77]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[78]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[79]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[80]  Zhong-Ping Jiang,et al.  Robust Adaptive Dynamic Programming With an Application to Power Systems , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[81]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[82]  Wenwu Yu,et al.  Distributed Adaptive Control of Synchronization in Complex Networks , 2012, IEEE Transactions on Automatic Control.

[83]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[84]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[85]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[86]  Aiguo Song,et al.  Adaptive impedance control based on dynamic recurrent fuzzy neural network for upper-limb rehabilitation robot , 2009, 2009 IEEE International Conference on Control and Automation.

[87]  T. Basar,et al.  H/sup /spl infin//-optimal tracking control techniques for nonlinear underactuated systems , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[88]  Kazuhiro Kosuge,et al.  Virtual internal model following control of robot arms , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[89]  Enrique Barbieri,et al.  Real-time Infinite Horizon Linear-Quadratic Tracking Controller for Vibration Quenching in Flexible Beams , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[90]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[91]  Frank L. Lewis,et al.  Multilayer neural-net robot controller with guaranteed tracking performance , 1996, IEEE Trans. Neural Networks.

[92]  Lorenzo Marconi,et al.  Nonlinear Output Regulation , 2010, The Control Systems Handbook.

[93]  Sivasubramanya N. Balakrishnan,et al.  Optimal Tracking Control of Motion Systems , 2012, IEEE Transactions on Control Systems Technology.

[94]  Wei Lin,et al.  Global L 2-gain design for a class of nonlinear systems 1 1 Supported in part by NSF under grant ECS-9412340 and by MURST. , 1998 .

[95]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[96]  A. Isidori H∞ control via measurement feedback for affine nonlinear systems , 1994 .

[97]  Qinglai Wei,et al.  Optimal Tracking Control for a Class of Nonlinear Time-Delay Systems with Actuator Saturation , 2013, BICS.

[98]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[99]  Shuzhi Sam Ge,et al.  Neural network based adaptive impedance control of constrained robots , 2002, Proceedings of the IEEE Internatinal Symposium on Intelligent Control.

[100]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[101]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[102]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[103]  Seul Jung,et al.  Neural network impedance force control of robot manipulator , 1998, IEEE Trans. Ind. Electron..

[104]  Petros A. Ioannou,et al.  Robust Adaptive Control , 2012 .

[105]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[106]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[107]  Claudio De Persis,et al.  On the Internal Model Principle in the Coordination of Nonlinear Systems , 2014, IEEE Transactions on Control of Network Systems.

[108]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[109]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[110]  Yaodong Pan,et al.  Variable Dynamic Assist Control on Haptic System for Human Adaptive Mechatronics , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[111]  Katsuhisa Furuta,et al.  Adaptive impedance control to enhance human skill on a haptic interface system , 2012 .

[112]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[113]  S. Lyshevski Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).

[114]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[115]  Abderrahmane Kheddar,et al.  Motion learning and adaptive impedance for robot control during physical interaction with humans , 2011, 2011 IEEE International Conference on Robotics and Automation.

[116]  Wei Ren,et al.  Information consensus in multivehicle cooperative control , 2007, IEEE Control Systems.

[117]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[118]  Bijnan Bandyopadhyay,et al.  Output Feedback Sliding-Mode Control for Uncertain Systems Using Fast Output Sampling Technique , 2006, IEEE Transactions on Industrial Electronics.

[119]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[120]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[121]  Stefan Schaal,et al.  Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments , 2012, IEEE Transactions on Autonomous Mental Development.

[122]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[123]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[124]  A. Tustin,et al.  The nature of the operator's response in manual control, and its implications for controller design , 1947 .

[125]  F. Lewis,et al.  A policy iteration approach to online optimal control of continuous-time constrained-input systems. , 2013, ISA transactions.

[126]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[127]  Fu-Chuang Chen,et al.  Adaptive control of nonlinear systems using neural networks , 1992 .

[128]  Derong Liu,et al.  Optimal Tracking Control Scheme for Discrete-Time Nonlinear Systems with Approximation Errors , 2013, ISNN.