Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

This paper presents a novel policy iteration approach for finding online adaptive optimal controllers for continuous-time linear systems with completely unknown system dynamics. The proposed approach employs the approximate/adaptive dynamic programming technique to iteratively solve the algebraic Riccati equation using the online information of state and input, without requiring the a priori knowledge of the system matrices. In addition, all iterations can be conducted by using repeatedly the same state and input information on some fixed time intervals. A practical online algorithm is developed in this paper, and is applied to the controller design for a turbocharged diesel engine with exhaust gas recirculation. Finally, several aspects of future work are discussed.

[1]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[2]  Shuzhi Sam Ge,et al.  Adaptive Neural Network Control of Robotic Manipulators , 1999, World Scientific Series in Robotics and Intelligent Systems.

[3]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[4]  Sarangapani Jagannathan,et al.  Online optimal control of nonlinear discrete-time systems using approximate dynamic programming , 2011 .

[5]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[6]  Warren E. Dixon,et al.  Asymptotic tracking by a reinforcement learning-based adaptive critic controller , 2011 .

[7]  Iven M. Y. Mareels,et al.  Adaptive systems - an introduction , 1996, Systems and control.

[8]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[9]  James E. Steck,et al.  Adaptive Feedback Control by Constrained Approximate Dynamic Programming , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[11]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[12]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[13]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[16]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[17]  Zhong-Ping Jiang,et al.  Robust approximate dynamic programming and global stabilization with nonlinear dynamic uncertainties , 2011, IEEE Conference on Decision and Control and European Control Conference.

[18]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Frank L. Lewis,et al.  Adaptive Dynamic Programming algorithm for finding online the equilibrium solution of the two-player zero-sum differential game , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[21]  Huaguang Zhang,et al.  Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions , 2009, Neurocomputing.

[22]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[23]  Zhong-Ping Jiang,et al.  Small-gain theorem for ISS systems and applications , 1994, Math. Control. Signals Syst..

[24]  Keith Glover,et al.  Comparison of Uncertainty Parameterisations for H∞ Robust Control of Turbo charged Diesel Engines , 2003 .

[25]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[26]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[27]  Paul J. Werbos,et al.  Stable adaptive control using new critic designs , 1998, Other Conferences.

[28]  L. Ljung,et al.  Adaptive Control Design and Analysis ( , 2014 .

[29]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[30]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[31]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[32]  Hao Xu,et al.  Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses , 2012, Autom..

[33]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[34]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[35]  Anuradha M. Annaswamy,et al.  Robust Adaptive Control , 1984, 1984 American Control Conference.