Structured exploration in the finite horizon linear quadratic dual control problem

This paper presents a novel approach to synthesize dual controllers for unknown linear time-invariant systems with the tasks of optimizing a quadratic cost while reducing the uncertainty. The work here builds on ideas from experiment design and robust control, and defines a synthesis problem where the feedback law has to simultaneously gain knowledge of the system and robustly optimize the given quadratic cost. Differently from recent results, the problem is framed in a finite horizon setting, which provides more insights on the trade-offs arising when the tasks include both identification and control of an unknown plant. One of the main findings of the work is that efficient exploration strategies are achieved when the structure of the problem (e.g. properties of the system) is exploited.

[1]  John E. R. Staddon,et al.  The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .

[2]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[3]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[4]  I. Postlethwaite,et al.  Linear Matrix Inequalities in Control , 2007 .

[5]  Andrew Packard,et al.  Robust H2 and H∞ filters for uncertain LFT systems , 2005, IEEE Trans. Autom. Control..

[6]  Lennart Ljung,et al.  Optimal experiment designs with respect to the intended model application , 1986, Autom..

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  Pablo A. Parrilo,et al.  A convex approach to robust H2 performance analysis , 2002, Autom..

[9]  K. Åström,et al.  Problems of Identification and Control , 1971 .

[10]  Zhi-Quan Luo,et al.  Multivariate Nonnegative Quadratic Mappings , 2003, SIAM J. Optim..

[11]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[12]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[13]  Bo Wahlberg,et al.  Application-Oriented Input Design in System Identification: Optimal Input Design for Control [Applications of Control] , 2017, IEEE Control Systems.

[14]  Thomas B. Schön,et al.  Learning Robust LQ-Controllers Using Application Oriented Exploration , 2020, IEEE Control Systems Letters.

[15]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[16]  Thomas B. Schön,et al.  Robust exploration in linear quadratic reinforcement learning , 2019, NeurIPS.

[17]  Avinatan Hassidim,et al.  Online Linear Quadratic Control , 2018, ICML.

[18]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[19]  Stephen P. Boyd,et al.  Policies for simultaneous estimation and optimization , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).