论文信息 - On the Linear Convergence of Random Search for Discrete-Time LQR

On the Linear Convergence of Random Search for Discrete-Time LQR

Model-free reinforcement learning techniques directly search over the parameter space of controllers. Although this often amounts to solving a nonconvex optimization problem, for benchmark control problems simple local search methods exhibit competitive performance. To understand this phenomenon, we study the discrete-time Linear Quadratic Regulator (LQR) problem with unknown state-space parameters. In spite of the lack of convexity, we establish that the random search method with two-point gradient estimates and a fixed number of roll-outs achieves <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-accuracy in <inline-formula> <tex-math notation="LaTeX">$O$ </tex-math></inline-formula>(<inline-formula> <tex-math notation="LaTeX">$\log $ </tex-math></inline-formula> (1/<inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>)) iterations. This significantly improves existing results on the model-free LQR problem which require <inline-formula> <tex-math notation="LaTeX">$O$ </tex-math></inline-formula>(1/<inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>) total roll-outs.

[1] Mahdi Soltanolkotabi,et al. Random search for learning the linear quadratic regulator , 2020, 2020 American Control Conference (ACC).

[2] Tamer Basar,et al. Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence , 2020, L4DC.

[3] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .

[4] Bin Hu,et al. Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems , 2020, 2020 American Control Conference (ACC).

[5] Armin Zare,et al. Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem , 2019, ArXiv.

[6] Karl Mårtensson,et al. Gradient Methods for Large-Scale and Distributed Linear Quadratic Control , 2012 .

[7] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[8] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[9] Roman Vershynin,et al. High-Dimensional Probability , 2018 .

[10] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[11] Mahdi Soltanolkotabi,et al. Global exponential convergence of gradient methods over the nonconvex landscape of the linear quadratic regulator , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[12] Martin J. Wainwright,et al. Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[13] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[14] R. Bellman,et al. The Riccati Equation , 1986 .

[15] G. Hewer. An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[16] Benjamin Recht,et al. A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[17] Fu Lin,et al. Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers , 2011, IEEE Transactions on Automatic Control.

[18] Mehran Mesbahi,et al. LQR through the Lens of First Order Methods: Discrete-time Case , 2019, ArXiv.

[19] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.