暂无分享,去创建一个
[1] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[2] Philip Amortila,et al. A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting , 2020, ArXiv.
[3] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[4] Sham M. Kakade,et al. A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.
[5] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[6] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2021, AISTATS.
[7] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[8] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[9] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[10] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations , 2020, NeurIPS.
[11] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[12] Ruosong Wang,et al. What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.
[13] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[14] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[15] Masatoshi Uehara,et al. Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency , 2021, ArXiv.
[16] Andrea Zanette,et al. Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL , 2020, ICML.
[17] Ruosong Wang,et al. Provably Efficient Reinforcement Learning with General Value Function Approximation , 2020, ArXiv.
[18] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.
[19] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[20] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Bo Dai,et al. Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.
[24] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[25] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[26] Qiang Liu,et al. Accountable Off-Policy Evaluation With Kernel Bellman Statistics , 2020, ICML.