暂无分享,去创建一个
Yuval Tassa | Rémi Munos | Martin A. Riedmiller | Abbas Abdolmaleki | Jost Tobias Springenberg | Nicolas Heess | N. Heess | Yuval Tassa | R. Munos | A. Abdolmaleki | J. T. Springenberg
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[3] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[4] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[5] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[6] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[7] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[8] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[9] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[10] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[11] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[12] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[13] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[14] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[15] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[16] N. Roy,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .
[17] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[18] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[19] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[21] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[22] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[23] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[24] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[25] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[26] Nicolas Le Roux. Efficient iterative policy optimization , 2016, ArXiv.
[27] Sergey Levine,et al. Deep Reinforcement Learning for Robotic Manipulation , 2016, ArXiv.
[28] Glen Berseth,et al. Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..
[29] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[30] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[31] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[32] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[33] Johannes Fürnkranz,et al. Model-Free Preference-Based Reinforcement Learning , 2016, AAAI.
[34] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[35] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[36] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[37] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[38] Luís Paulo Reis,et al. Deriving and improving CMA-ES with information geometric trust regions , 2017, GECCO.
[39] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[40] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[41] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[42] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[43] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[44] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[45] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[46] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[47] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[48] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[49] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.