暂无分享,去创建一个
[1] Yufeng Zhang,et al. Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory , 2020, NeurIPS.
[2] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[3] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity , 2020, ArXiv.
[4] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[5] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[6] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[7] Rudolf Taschner,et al. The Dirichlet Approximation Theorem , 1986 .
[8] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[9] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[10] Sergey Levine,et al. DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.
[11] Joelle Pineau,et al. Interference and Generalization in Temporal Difference Learning , 2020, ICML.
[12] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[13] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Deep Reinforcement Learning , 2020, International Conference on Machine Learning.
[14] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .
[15] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[16] Quanquan Gu,et al. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation , 2020, ICML.
[17] Dina Katabi,et al. Harnessing Structures for Value-Based Planning and Reinforcement Learning , 2020, ICLR.
[18] Yoshua Bengio,et al. Revisiting Fundamentals of Experience Replay , 2020, ICML.
[19] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[20] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[21] Benjamin Van Roy,et al. The linear programming approach to approximate dynamic programming: theory and application , 2002 .
[22] Wei Chen,et al. I4R: Promoting Deep Reinforcement Learning by the Indicator for Expressive Representations , 2020, IJCAI.
[23] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[24] Marc G. Bellemare,et al. Representations for Stable Off-Policy Reinforcement Learning , 2020, ICML.
[25] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[26] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[27] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[28] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[29] Martha White,et al. The Utility of Sparse Representations for Control in Reinforcement Learning , 2018, AAAI.
[30] Yoshua Bengio,et al. On Catastrophic Interference in Atari 2600 Games , 2020, ArXiv.
[31] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[32] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[33] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[34] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[35] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[36] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[37] Hossein Mobahi,et al. Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.
[38] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[39] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[40] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[41] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[42] George Tucker,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[43] Nan Jiang,et al. Batch Value-function Approximation with Only Realizability , 2020, ICML.
[44] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[45] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[46] Jason D. Lee,et al. Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima. , 2019, 1905.10027.
[47] H. Eom. Green’s Functions: Applications , 2004 .
[48] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[49] Shimon Whiteson,et al. The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning , 2020, ArXiv.
[50] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[51] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[52] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[53] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[54] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[55] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[56] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.