Learning the model-free linear quadratic regulator via random search

Model-free reinforcement learning techniques attempt to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we examine the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving -accuracy in a model-free setup and the total number of function evaluations are both of O(log (1/ )).

[1]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[2]  Boris Polyak,et al.  Optimizing Static Linear Feedback: Gradient Method , 2020, SIAM J. Control. Optim..

[3]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[4]  Mahdi Soltanolkotabi,et al.  Global exponential convergence of gradient methods over the nonconvex landscape of the linear quadratic regulator , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[5]  Fu Lin,et al.  Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers , 2011, IEEE Transactions on Automatic Control.

[6]  Mahdi Soltanolkotabi,et al.  Convergence and Sample Complexity of Gradient Methods for the Model-Free Linear–Quadratic Regulator Problem , 2019, IEEE Transactions on Automatic Control.

[7]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[8]  Mehran Mesbahi,et al.  LQR through the Lens of First Order Methods: Discrete-time Case , 2019, ArXiv.

[9]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[10]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[11]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[12]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Nevena Lazic,et al.  Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[16]  J. Ackermann Parameter space design of robust control systems , 1980 .

[17]  Michael I. Jordan,et al.  Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.