Reinforcement learning based computational adaptive optimal control and system identification for linear systems

Abstract The duality of estimation and control problems is a well known fact in control theory literature. Simultaneous parameter estimation while maintaining closed loop stability is a very difficult proposition and more so for unstable systems, even for linear systems. This typically motivates system identification to be performed only in offline experiments. Clearly, there is a need for a higher level abstraction for a control and identification scheme which acts in stages and prioritizes various aspects of the problem at each of these stages. The stage abstraction for the controller design in this paper is inspired by human intuition towards dealing with control and identification simultaneously and hence named “Intuitive Control Framework”. The first phase prioritizes stabilization of the system only. The controller moves onto the next phase after the unknown system is stabilized. The subsequent stages during this phase involve optimization with different performance metrics through adaptive learning. After enough information for identification is acquired, the control schemes developed for various optimal metrics are used to estimate the unknown parameters in the final phase. This narrative for selective prioritization of objectives and a higher level abstraction for control schemes is illustrated for a continuous linear time invariant state space realization with state feedback. Numerous real-world applications can benefit from this online system identification routine inspired by the human cognitive process. This offers a seamless integration of control and identification with a higher level of priorities. Such a framework is presented with explicit formulations for certain classes of dynamic systems, and evaluated with computer simulations as well as experimental results. An unstable multi-input multi-output linear system is used as an example to illustrate the approach.

[1]  Petros A. Ioannou,et al.  Parameter convergence of a new class of adaptive controllers , 1996, IEEE Trans. Autom. Control..

[2]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[3]  Mark B. Tischler,et al.  Aircraft and Rotorcraft System Identification: Engineering Methods with Flight-Test Examples , 2006 .

[4]  Jae Young Lee,et al.  On integral value iteration for continuous-time linear systems , 2013, 2013 American Control Conference.

[5]  Robert E. Skelton,et al.  Closed-loop identification and iterative controller design , 1990, 29th IEEE Conference on Decision and Control.

[6]  Dimitri P. Bertsekas,et al.  Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Yeung Yam,et al.  A criterion for joint optimization of identification and robust control , 1992 .

[9]  K. Åström,et al.  Numerical Identification of Linear Dynamic Systems from Normal Operating Records , 1966 .

[10]  Β. L. HO,et al.  Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[11]  D. Bernstein Matrix Mathematics: Theory, Facts, and Formulas , 2009 .

[12]  Lennart Ljung,et al.  Frequency domain versus time domain methods in system identification , 1981, Autom..

[13]  Lennart Ljung,et al.  Identification of processes in closed loop - identifiability and accuracy aspects , 1977, Autom..

[14]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[15]  Petros A. Ioannou,et al.  Robust Adaptive Control , 2012 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[17]  Kamesh Subbarao,et al.  Autonomous vertical landing on a marine vessel , 2014 .

[18]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[19]  Ruud J. P. Schrama Accurate identification for control: the necessity of an iterative scheme , 1992 .

[20]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[21]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[22]  S. Wu,et al.  A circle stability criterion for a class of discrete systems , 1967, IEEE Transactions on Automatic Control.

[23]  Jae Young Lee,et al.  A novel generalized value iteration scheme for uncertain continuous-time linear systems , 2010, 49th IEEE Conference on Decision and Control (CDC).

[24]  Kamesh Subbarao,et al.  Experimental Verification of Linear and Adaptive Control Techniques for a Two Degrees-of-Freedom Helicopter , 2015 .

[25]  C. Richard Johnson,et al.  Reduced-order performance of parallel and series-parallel identifiers with weakly observable parasitics , 1983, Autom..

[26]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[27]  F. Lewis,et al.  Continuous-Time ADP for Linear Systems with Partially Unknown Dynamics , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[28]  M. Gevers,et al.  A personal view of the development of system identification: A 30-year journey through an exciting field , 2006, IEEE Control Systems.

[29]  Kamesh Subbarao,et al.  Aspects of intuitive control: Stabilize, optimize, and identify , 2015 .