暂无分享,去创建一个
Sergey Levine | Yoshua Bengio | Hugo Larochelle | Anirudh Goyal | Philemon Brakel | Timothy P. Lillicrap | William Fedus | Yoshua Bengio | S. Levine | T. Lillicrap | H. Larochelle | Anirudh Goyal | Philemon Brakel | W. Fedus | Soumye Singhal
[1] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[2] Shumeet Baluja,et al. A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .
[3] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .
[4] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.
[5] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[7] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[8] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[9] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[10] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[11] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[12] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[13] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[14] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[15] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[16] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[17] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[19] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[20] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[21] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[22] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[23] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[24] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[29] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[30] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[31] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[32] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.
[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[34] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[35] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[36] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[37] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[38] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[39] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[40] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[41] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[42] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[43] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[44] Surya Ganguli,et al. Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net , 2017, NIPS.
[45] Daan Wierstra,et al. Recurrent Environment Simulators , 2017, ICLR.
[46] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[47] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[48] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[49] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.
[50] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[51] Ashley D. Edwards,et al. Forward-Backward Reinforcement Learning , 2018, ArXiv.