暂无分享,去创建一个
Yoshua Bengio | Hugo Larochelle | Will Dabney | Prajit Ramachandran | William Fedus | Mark Rowland | Rishabh Agarwal | Yoshua Bengio | Mark Rowland | Will Dabney | H. Larochelle | W. Fedus | Prajit Ramachandran | M. Rowland | Rishabh Agarwal
[1] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Martha White,et al. Importance Resampling for Off-policy Prediction , 2019, NeurIPS.
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[6] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[7] Richard S. Sutton,et al. A Deeper Look at Experience Replay , 2017, ArXiv.
[8] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[9] Christopher Amato,et al. Reconciling λ-Returns with Experience Replay , 2019, NeurIPS.
[10] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[11] Jiayu Zhou,et al. Ranking Policy Gradient , 2019, ICLR.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[14] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[15] Petros Koumoutsakos,et al. Remember and Forget for Experience Replay , 2018, ICML.
[16] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[17] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[18] Martha White,et al. Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains , 2018, IJCAI.
[19] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[20] Peiquan Sun,et al. Attentive Experience Replay , 2020, AAAI.
[21] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[22] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[23] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[24] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[25] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[28] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[29] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[32] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[33] Daochen Zha,et al. Experience Replay Optimization , 2019, IJCAI.
[34] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[35] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[36] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[37] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[38] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[39] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[40] James Zou,et al. The Effects of Memory Replay in Reinforcement Learning , 2017, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[41] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[42] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[43] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[44] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[45] Sae-Young Chung,et al. Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update , 2018, NeurIPS.
[46] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..
[47] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[48] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[49] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[50] Jessica B. Hamrick,et al. Combining Q-Learning and Search with Amortized Value Estimates , 2020, ICLR.