暂无分享,去创建一个
Sham M. Kakade | Mehran Mesbahi | Maryam Fazel | Rong Ge | S. Kakade | M. Fazel | Rong Ge | M. Mesbahi | Maryam Fazel
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .
[3] G. Hewer. An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .
[4] E. Polak. An historical survey of computational methods in optimal control. , 1973 .
[5] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .
[6] L. Liao,et al. Convergence in unconstrained discrete-time differential dynamic programming , 1991 .
[7] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] Leiba Rodman,et al. Algebraic Riccati equations , 1995 .
[10] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.
[11] E. Tyrtyshnikov. A brief introduction to numerical analysis , 1997 .
[12] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[13] Lennart Ljung,et al. System identification (2nd ed.): theory for the user , 1999 .
[14] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[15] Duan Li,et al. A Globally Convergent and Efficient Method for Unconstrained Discrete-Time Optimal Control , 2002, J. Glob. Optim..
[16] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[17] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[18] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[19] Venkataramanan Balakrishnan,et al. Semidefinite programming duality and linear time-invariant systems , 2003, IEEE Trans. Autom. Control..
[20] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[21] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[22] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[23] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[24] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[25] Anders Rantzer,et al. Gradient methods for iterative distributed control synthesis , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[26] John L. Nazareth,et al. Introduction to derivative-free optimization , 2010, Math. Comput..
[27] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[28] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[29] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[30] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[31] Karl Mårtensson,et al. Gradient Methods for Large-Scale and Distributed Linear Quadratic Control , 2012 .
[32] Aaron Hertzmann,et al. Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.
[33] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[34] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[35] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[38] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[39] Sergey Levine,et al. Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[40] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[41] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[42] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.
[43] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[44] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[45] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[46] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[47] Sanjeev Arora,et al. Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.
[48] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[49] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[50] Yi Zhang,et al. Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.
[51] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.