Fast Rates for the Regret of Offline Reinforcement Learning
暂无分享,去创建一个
[1] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.
[2] Nathan Kallus,et al. Fast Rates for Contextual Linear Optimization , 2020, Manag. Sci..
[3] Ruosong Wang,et al. What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.
[4] Csaba Szepesv'ari,et al. Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions , 2020, ALT.
[5] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[6] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2020, ArXiv.
[7] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[8] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[9] Masatoshi Uehara,et al. Statistically Efficient Off-Policy Policy Gradients , 2020, ICML.
[10] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[11] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes , 2019, ArXiv.
[12] Nathan Kallus,et al. Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes , 2019, COLT.
[13] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[14] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT.
[15] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[16] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[17] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[18] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[19] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[20] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[21] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[22] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[23] A. Zeevi,et al. A Linear Response Bandit Problem , 2013 .
[24] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[25] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[26] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[27] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[28] A. Tsybakov,et al. Fast learning rates for plug-in classifiers , 2007, 0708.2321.
[29] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[30] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[31] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[32] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[33] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .
[34] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[35] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .
[36] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[37] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .