暂无分享,去创建一个
[1] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.
[2] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[3] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[4] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[5] Robert D. Nowak,et al. Query Complexity of Derivative-Free Optimization , 2012, NIPS.
[6] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[7] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[8] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[9] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[10] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[11] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[12] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[15] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[16] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[17] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[19] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[21] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[22] Vianney Perchet,et al. Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[25] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[26] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[28] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[29] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[30] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.
[31] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[32] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[33] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[34] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[35] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[36] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.