On Applications of Bootstrap in Continuous Space Reinforcement Learning

In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and control. However, little is known about policies based on bootstrapping observed states and control inputs. In this work, we show that bootstrap-based policies achieve a square root scaling of regret with respect to time. We also obtain results on the accuracy of learning the model’s dynamics. Corroborative numerical analysis that illustrates the technical results is also provided.

[1]  Ambuj Tewari,et al.  Finite Time Identification in Unstable Linear Systems , 2017, Autom..

[2]  Zheng Wen,et al.  New Insights into Bootstrapping for Bandits , 2018, ArXiv.

[3]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[4]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[5]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[6]  Karthikeyan Rajagopal,et al.  Neural Network-Based Solutions for Stochastic Optimal Control Using Path Integrals , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[7]  T. Lai,et al.  Extended least squares and their applications to adaptive control and prediction in linear systems , 1986 .

[8]  Nikolai Matni,et al.  Safely Learning to Control the Constrained Linear Quadratic Regulator , 2018, 2019 American Control Conference (ACC).

[9]  Alessandro Lazaric,et al.  LQG for Portfolio Optimization , 2016, 1611.00997.

[10]  Alexander Rakhlin,et al.  How fast can linear dynamical systems be learned? , 2018, ArXiv.

[11]  E. Mammen The Bootstrap and Edgeworth Expansion , 1997 .

[12]  Tor Lattimore,et al.  Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.

[13]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[14]  P. Kumar,et al.  Adaptive control with the stochastic approximation algorithm: Geometry and convergence , 1985 .

[15]  A. Timmermann,et al.  Small Sample Properties of Forecasts from Autoregressive Models Under Structural Breaks , 2003, SSRN Electronic Journal.

[16]  Ambuj Tewari,et al.  On Optimality of Adaptive Linear-Quadratic Regulators , 2018, ArXiv.

[17]  T. Lai,et al.  Asymptotic properties of general autoregressive models and strong consistency of least-squares estimates of their parameters , 1983 .

[18]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[19]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[20]  Lei Guo,et al.  Global Stability/Instability of LS-Based Discrete-Time Adaptive Nonlinear Control , 1996 .

[21]  P. Hall,et al.  Martingale Limit Theory and its Application. , 1984 .

[22]  Jan Willem Polderman,et al.  A note on the structure of two subsets of the parameter space in adaptive control problems , 1986 .

[23]  Anders Lindquist,et al.  On the Nonlinear Dynamics of Fast Filtering Algorithms , 1994 .

[24]  Chaouki T. Abdallah,et al.  Linear Quadratic Control: An Introduction , 2000 .

[25]  Dean Eckles,et al.  Thompson sampling with the online bootstrap , 2014, ArXiv.

[26]  Lihong Li,et al.  Sample Complexity Bounds of Exploration , 2012, Reinforcement Learning.

[27]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[28]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[29]  Mohamad Kazem Shirani Faradonbeh,et al.  Finite Time Adaptive Stabilization of LQ Systems , 2018 .

[30]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[31]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[32]  Jan Willem Polderman,et al.  On the necessity of identifying the true parameter in adaptive LQ control , 1986 .

[33]  Benjamin Van Roy,et al.  Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.

[34]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[35]  Ambuj Tewari,et al.  Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[36]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[37]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[38]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[39]  Ambuj Tewari,et al.  Finite Time Adaptive Stabilization of LQ Systems , 2018, ArXiv.

[40]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[41]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[42]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[43]  D. McLeish Dependent Central Limit Theorems and Invariance Principles , 1974 .

[44]  B. M. Brown,et al.  Martingale Central Limit Theorems , 1971 .

[45]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[46]  Alexander Rakhlin,et al.  Near optimal finite time identification of arbitrary linear dynamical systems , 2018, ICML.

[47]  Mohamad Kazem Shirani Faradonbeh,et al.  Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[48]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[49]  Ambuj Tewari,et al.  On adaptive Linear-Quadratic regulators , 2020, Autom..

[50]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[51]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[52]  T. Lai Asymptotically efficient adaptive control in stochastic regression models , 1986 .

[53]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .