暂无分享,去创建一个
Pieter Abbeel | Roy Fox | Stephen McAleer | Alexander Ihler | Dailin Hu | Litian Liang | Yaosheng Xu | P. Abbeel | Roy Fox | A. Ihler | Yaosheng Xu | S. McAleer | Litian Liang | Dailin Hu
[1] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[2] Pieter Abbeel,et al. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning , 2021, ICML.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[7] Sergey Levine,et al. DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.
[8] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[9] Naftali Tishby,et al. Trading Value and Information in MDPs , 2012 .
[10] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[11] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[12] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[13] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[14] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[15] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[16] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[17] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[18] Qiang Liu,et al. Bounding the Partition Function using Holder's Inequality , 2011, ICML.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Pieter Abbeel,et al. Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning , 2021 .
[21] Pieter Abbeel,et al. Target Entropy Annealing for Discrete Soft Actor-Critic , 2021, ArXiv.
[22] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[23] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[24] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[25] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[26] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[27] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[28] Roy Fox. Toward Provably Unbiased Temporal-Difference Value Estimation , 2019 .
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.