Input perturbations for adaptive control and learning

This paper studies adaptive algorithms for simultaneous regulation (i.e., control) and estimation (i.e., learning) of Multiple Input Multiple Output (MIMO) linear dynamical systems. It proposes practical, easy to implement control policies based on perturbations of input signals. Such policies are shown to achieve a worst-case regret that scales as the square-root of the time horizon, and holds uniformly over time. Further, it discusses specific settings where such greedy policies attain the information theoretic lower bound of logarithmic regret. To establish the results, recent advances on self-normalized martingales together with a novel method of policy decomposition are leveraged.

[1]  John B. Moore,et al.  Persistence of Excitation in Linear Systems , 1985, 1985 American Control Conference.

[2]  Riccardo Marino,et al.  Nonlinear control design: geometric, adaptive and robust , 1995 .

[3]  Ambuj Tewari,et al.  On Optimality of Adaptive Linear-Quadratic Regulators , 2018, ArXiv.

[4]  Ambuj Tewari,et al.  Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.

[5]  Ambuj Tewari,et al.  Finite Time Identification in Unstable Linear Systems , 2017, Autom..

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Michael I. Jordan,et al.  Is Q-learning Provably Efficient? , 2018, NeurIPS.

[8]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[9]  Tamer Basar,et al.  Optimal control of LTI systems over unreliable communication links , 2006, Autom..

[10]  Sean P. Meyn Control Techniques for Complex Networks: Workload , 2007 .

[11]  Yi Ouyang,et al.  Optimal Infinite Horizon Decentralized Networked Controllers With Unreliable Communication , 2018, IEEE Transactions on Automatic Control.

[12]  Daphna Weinshall,et al.  Online Learning in the Embedded Manifold of Low-rank Matrices , 2012, J. Mach. Learn. Res..

[13]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[14]  A. Timmermann,et al.  Small Sample Properties of Forecasts from Autoregressive Models Under Structural Breaks , 2003, SSRN Electronic Journal.

[15]  Adel Javanmard,et al.  Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.

[16]  Ambuj Tewari,et al.  On adaptive Linear-Quadratic regulators , 2020, Autom..

[17]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[18]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[19]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[20]  Ambuj Tewari,et al.  Finite Time Adaptive Stabilization of LQ Systems , 2018, ArXiv.

[21]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[22]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[23]  S. Liberty,et al.  Linear Systems , 2010, Scientific Parallel Computing.

[24]  Jan Willem Polderman,et al.  A note on the structure of two subsets of the parameter space in adaptive control problems , 1986 .

[25]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[26]  James Lam,et al.  Stabilization of Discrete-Time Nonlinear Uncertain Systems by Feedback Based on LS Algorithm , 2013, SIAM J. Control. Optim..

[27]  Ambuj Tewari,et al.  Optimality of Fast-Matching Algorithms for Random Networks With Applications to Structural Controllability , 2015, IEEE Transactions on Control of Network Systems.

[28]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[29]  Khashayar Khosravi,et al.  Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..

[30]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[31]  Han-Fu Chen,et al.  The AAstrom-Wittenmark self-tuning regulator revisited and ELS-based adaptive trackers , 1991 .

[32]  Ruth F. Curtain,et al.  Linear-quadratic control: An introduction , 1997, Autom..

[33]  Jan Willem Polderman,et al.  On the necessity of identifying the true parameter in adaptive LQ control , 1986 .