On Model-Free Reinforcement Learning of Reduced-Order Optimal Control for Singularly Perturbed Systems

We propose a model-free reduced-order optimal control design for linear time-invariant singularly perturbed (SP) systems using reinforcement learning (RL). Both the state and input matrices of the plant model are assumed to be completely unknown. The only assumption imposed is that the model admits a similarity transformation that results in a SP representation. We propose a variant of Adaptive Dynamic Programming (ADP) that employs only the slow states of this SP model to learn a reduced-order adaptive optimal controller. The method significantly reduces the learning time, and complexity required for the feedback control by taking advantage of this model reduction. We use approximation theorems from singular perturbation theory to establish sub-optimality of the learned controller, and to guarantee closed-loop stability. We validate our results using two representative examples - one with a standard singularly perturbed dynamics, and the other with clustered multi-agent consensus dynamics. Both examples highlight various implementation details and effectiveness of the proposed approach.

[1]  P. Kokotovic,et al.  A decomposition of near-optimum regulators for systems with slow and fast modes , 1976, 1976 IEEE Conference on Decision and Control including the 15th Symposium on Adaptive Processes.

[2]  Joe H. Chow,et al.  Time scale modeling of sparse dynamic networks , 1985 .

[3]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Hassan K. Khalil,et al.  Singular perturbation methods in control : analysis and design , 1986 .

[6]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[7]  Aranya Chakrabortty,et al.  H2 -clustering of closed-loop consensus networks under a class of LQR design , 2016, 2016 American Control Conference (ACC).

[8]  Petar V. Kokotovic,et al.  Singular perturbations and order reduction in control theory - An overview , 1975, at - Automatisierungstechnik.

[9]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[10]  Aranya Chakrabortty,et al.  Robust Stabilization and Performance Recovery of Nonlinear Systems With Unmodeled Dynamics , 2008, IEEE Transactions on Automatic Control.

[11]  P. Kokotovic,et al.  A decomposition of near-optimum regulators for systems with slow and fast modes , 1976 .

[12]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Almuatazbellah M. Boker,et al.  On aggregate control of clustered consensus networks , 2015, 2015 American Control Conference (ACC).

[14]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[15]  Kyriakos G. Vamvoudakis,et al.  Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach , 2017, Syst. Control. Lett..