Off-Policy Deep Reinforcement Learning without Exploration
暂无分享,去创建一个
[1] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[2] Karl Tuyls,et al. The importance of experience replay database composition in deep reinforcement learning , 2015 .
[3] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[4] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[5] Sergey Levine,et al. Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[6] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[10] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[13] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[16] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[17] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[18] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[19] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[20] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[21] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[22] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[23] Joelle Pineau,et al. Randomized Value Functions via Multiplicative Normalizing Flows , 2018, UAI.
[24] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.
[25] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[26] Leslie Pack Kaelbling,et al. Residual Policy Learning , 2018, ArXiv.
[27] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[28] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[29] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[30] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[31] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[32] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[33] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[34] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[35] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[36] Noah D. Goodman,et al. Learning the Preferences of Ignorant, Inconsistent Agents , 2015, AAAI.
[37] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[38] Gregory Dudek,et al. Synthesizing Neural Network Controllers with Probabilistic Model-Based Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[39] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[40] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.
[41] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[44] Pierre-Yves Oudeyer,et al. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.
[45] Katja Hofmann,et al. Depth and nonlinearity induce implicit exploration for RL , 2018, ArXiv.
[46] David Isele,et al. Selective Experience Replay for Lifelong Learning , 2018, AAAI.
[47] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[48] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[49] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.
[50] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[51] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[52] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[53] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[54] Alessandro Lazaric,et al. Direct Policy Iteration with Demonstrations , 2015, IJCAI.
[55] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[56] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[57] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[58] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[59] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.
[60] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[61] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[62] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.
[63] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[64] Jan Peters,et al. Non-parametric Policy Search with Limited Information Loss , 2017, J. Mach. Learn. Res..
[65] Byron Boots,et al. Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning , 2018, ICLR.
[66] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[67] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[68] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[69] Don Joven Agravante,et al. Constrained Exploration and Recovery from Experience Shaping , 2018, ArXiv.
[70] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[71] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.
[72] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[73] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.
[74] Richard S. Sutton,et al. A Deeper Look at Experience Replay , 2017, ArXiv.
[75] Robert Babuska,et al. Improved deep reinforcement learning for robotics through distribution-based experience retention , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[76] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[77] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[78] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.
[79] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[80] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[81] Mohamed Medhat Gaber,et al. Imitation Learning , 2017, ACM Comput. Surv..