暂无分享,去创建一个
[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[2] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[3] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[4] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[6] Piotr Stanczyk,et al. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference , 2020, ICLR.
[7] Bruno Scherrer,et al. Momentum in Reinforcement Learning , 2020, AISTATS.
[8] Harm van Seijen,et al. Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning , 2019, NeurIPS.
[9] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[10] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[11] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[12] Tie-Yan Liu,et al. Fully Parameterized Quantile Function for Distributional Reinforcement Learning , 2019, NeurIPS.
[13] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[14] Munchausen. English,et al. Baron Munchausen's narrative of his marvelous travels and campaigns in Russia , 1928 .
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[17] Lawrence Carin,et al. Revisiting the Softmax Bellman Operator: New Benefits and New Perspective , 2018, ICML.
[18] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[19] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[21] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[22] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[23] Takamitsu Matsubara,et al. Deep dynamic policy programming for robot control with raw images , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[24] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[25] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[26] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[27] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[28] Kenji Doya,et al. Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning , 2019, AISTATS.
[29] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[30] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[31] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[32] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[33] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[34] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[35] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.