暂无分享,去创建一个
[1] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[2] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[3] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[4] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[5] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .
[6] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[7] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.
[8] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.
[9] Sergey Levine,et al. Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[10] Sen Wang,et al. Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[11] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[12] Yusen Zhan,et al. Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer , 2016, IJCAI.
[13] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[14] Matthew E. Taylor,et al. Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human/Agent's Demonstration , 2018, ArXiv.
[15] Matthew E. Taylor,et al. Improving Reinforcement Learning with Confidence-Based Demonstrations , 2017, IJCAI.
[16] Thomas Brox,et al. CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity , 2019, 1902.05605.
[17] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[18] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[19] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[20] Leslie Pack Kaelbling,et al. Residual Policy Learning , 2018, ArXiv.
[21] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.
[22] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[23] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.
[24] M.T. Rosenstein,et al. Reinforcement learning with supervision by a stable controller , 2004, Proceedings of the 2004 American Control Conference.
[25] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[27] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[28] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[29] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[30] Alessandro Lazaric,et al. Regret Bounds for Reinforcement Learning with Policy Advice , 2013, ECML/PKDD.
[31] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[32] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[33] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.
[34] Siyuan Li,et al. Context-Aware Policy Reuse , 2018, AAMAS.
[35] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[36] Lydia Tapia,et al. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[37] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[40] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[41] Peter Henderson,et al. Bayesian Policy Gradients via Alpha Divergence Dropout Inference , 2017, ArXiv.
[42] Siyuan Li,et al. An Optimal Online Method of Selecting Source Policies for Reinforcement Learning , 2017, AAAI.
[43] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.