暂无分享,去创建一个
Yang Gao | Sergey Levine | Trevor Darrell | Fisher Yu | Huazhe Xu | Ji Lin | S. Levine | Trevor Darrell | F. Yu | Huazhe Xu | Yang Gao | Ji Lin
[1] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[2] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[3] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[4] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[5] Trevor Darrell,et al. Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.
[6] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[7] Tom Schaul,et al. Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.
[8] Yang Gao,et al. End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[11] Andrea Lockerd Thomaz,et al. Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.
[12] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[13] Paul Thie. Markov Decision Processes , 1983 .
[14] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[15] Sonia Chernova,et al. Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.
[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[17] Nando de Freitas,et al. Robust Imitation of Diverse Behaviors , 2017, NIPS.
[18] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[19] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[20] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[21] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[22] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[23] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[24] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.
[25] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[26] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[27] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[28] Trevor Darrell,et al. Gradient-free Policy Architecture Search and Adaptation , 2017, CoRL.
[29] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
[30] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[31] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[32] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[33] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[34] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.
[35] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[36] Alessandro Lazaric,et al. Direct Policy Iteration with Demonstrations , 2015, IJCAI.
[37] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[38] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[39] U. Rieder,et al. Markov Decision Processes , 2010 .
[40] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.