On the Linear Convergence of Random Search for Discrete-Time LQR

Model-free reinforcement learning techniques directly search over the parameter space of controllers. Although this often amounts to solving a nonconvex optimization problem, for benchmark control problems simple local search methods exhibit competitive performance. To understand this phenomenon, we study the discrete-time Linear Quadratic Regulator (LQR) problem with unknown state-space parameters. In spite of the lack of convexity, we establish that the random search method with two-point gradient estimates and a fixed number of roll-outs achieves <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-accuracy in <inline-formula> <tex-math notation="LaTeX">$O$ </tex-math></inline-formula>(<inline-formula> <tex-math notation="LaTeX">$\log $ </tex-math></inline-formula> (1/<inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>)) iterations. This significantly improves existing results on the model-free LQR problem which require <inline-formula> <tex-math notation="LaTeX">$O$ </tex-math></inline-formula>(1/<inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>) total roll-outs.

[1]  Mahdi Soltanolkotabi,et al.  Random search for learning the linear quadratic regulator , 2020, 2020 American Control Conference (ACC).

[2]  Tamer Basar,et al.  Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence , 2020, L4DC.

[3]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[4]  Bin Hu,et al.  Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems , 2020, 2020 American Control Conference (ACC).

[5]  Armin Zare,et al.  Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem , 2019, ArXiv.

[6]  Karl Mårtensson,et al.  Gradient Methods for Large-Scale and Distributed Linear Quadratic Control , 2012 .

[7]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[8]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[9]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[10]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[11]  Mahdi Soltanolkotabi,et al.  Global exponential convergence of gradient methods over the nonconvex landscape of the linear quadratic regulator , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[12]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[13]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[14]  R. Bellman,et al.  The Riccati Equation , 1986 .

[15]  G. Hewer An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[16]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[17]  Fu Lin,et al.  Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers , 2011, IEEE Transactions on Automatic Control.

[18]  Mehran Mesbahi,et al.  LQR through the Lens of First Order Methods: Discrete-time Case , 2019, ArXiv.

[19]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.