Policy iteration-based indirect adaptive optimal control for completely unknown continuous-time LTI systems

This paper proposes a novel indirect adaptive optimal controller (AOC) for completely unknown continuous-time (CT) linear time invariant (LTI) systems using the policy iteration (PI) technique. The algorithm builds on the Kleinman's method of iteratively solving the algebraic Riccati equation (ARE). However, the actual system and control matrices information, required by the Kleinman's algorithm, is replaced by their CT online estimates using uniform sampling. A gradient-based online system identifier is developed using a low pass filter, which strategically eliminates the need for state derivative information, while the system identifier exponentially converges to the actual plant-parameter vector under the assumption of persistence of excitation (PE). The proposed online identifier based Kleinman's algorithm is shown to converge to the optimal control policy while preserving the stabilizability of the intermediate policies for the unknown CT LTI systems as validated through simulation studies on multi-input-multi-output (MIMO) LTI systems. The designed indirect AOC is argued to be computationally less intricate as compared to the past literature on direct AOC.

[1]  Warren E. Dixon,et al.  Model-based reinforcement learning for approximate optimal regulation , 2016, Autom..

[2]  B. Anderson,et al.  NEW RESULTS IN LINEAR SYSTEM STABILITY , 1969 .

[3]  Sumit Kumar Jha,et al.  Data-Driven Adaptive LQR for Completely Unknown LTI Systems , 2017 .

[4]  Sumit Kumar Jha,et al.  On-policy Q-learning for adaptive optimal control , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[5]  Kyriakos G. Vamvoudakis,et al.  Asymptotically Stable Adaptive–Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Arjan van der Schaft,et al.  An Introduction to Hybrid Dynamical Systems, Springer Lecture Notes in Control and Information Sciences 251 , 1999 .

[7]  Gang Tao,et al.  Adaptive Control Design and Analysis , 2003 .

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Anuradha M. Annaswamy,et al.  Convergence Properties of Adaptive Systems and the Definition of Exponential Stability , 2015, SIAM J. Control. Optim..

[10]  Warren E. Dixon,et al.  Approximate optimal trajectory tracking for continuous-time nonlinear systems , 2013, Autom..

[11]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[12]  S. Shankar Sastry,et al.  Global stability proofs for continuous-time indirect adaptive control schemes , 1987 .

[13]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[14]  Petros A. Ioannou,et al.  Robust Adaptive Control , 2012 .

[15]  Sean P. Meyn,et al.  Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[16]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[17]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[19]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[20]  Azwirman Gusrialdi,et al.  ESC-MRAC of MIMO systems for constrained robotic motion tasks in deformable environments , 2014, 2014 European Control Conference (ECC).

[21]  Jae Young Lee,et al.  Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems , 2012, Autom..

[22]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[23]  Antonio Loría,et al.  Relaxed persistency of excitation for uniform asymptotic stability , 2001, IEEE Trans. Autom. Control..

[24]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[25]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[26]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems , 2016, IEEE Transactions on Automatic Control.