暂无分享,去创建一个
Yuval Tassa | Steven Bohez | Martin A. Riedmiller | Abbas Abdolmaleki | Jost Tobias Springenberg | Nicolas Heess | Jonas Degrave | Dan Belov | N. Heess | Yuval Tassa | A. Abdolmaleki | Jonas Degrave | Steven Bohez | Dan Belov | J. T. Springenberg
[1] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[2] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[3] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[4] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[5] Masashi Sugiyama,et al. Guide Actor-Critic for Continuous Control , 2017, ICLR.
[6] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .
[7] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[8] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[9] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[12] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[13] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[14] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.
[15] Martin A. Riedmiller,et al. Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.
[16] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[17] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[18] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[19] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[20] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[21] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[22] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[23] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[24] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[25] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[26] Shimon Whiteson,et al. Expected Policy Gradients , 2017, AAAI.
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[29] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[30] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..
[31] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[32] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[33] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[34] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[35] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[36] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[37] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[38] Luís Paulo Reis,et al. Deriving and improving CMA-ES with information geometric trust regions , 2017, GECCO.
[39] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[40] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[41] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[42] N. Hansen,et al. Convergence Properties of Evolution Strategies with the Derandomized Covariance Matrix Adaptation: T , 1997 .
[43] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[44] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[45] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[46] Luís Paulo Reis,et al. Contextual Covariance Matrix Adaptation Evolutionary Strategies , 2017, IJCAI.