Batch Value-function Approximation with Only Realizability
暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[2] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[3] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[4] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[5] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[6] Jiawei Huang,et al. Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization , 2020, ArXiv.
[7] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.
[8] Shie Mannor,et al. Model selection in markovian processes , 2013, KDD.
[9] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[10] Nan Jiang,et al. On Value Functions and the Agent-Environment Boundary , 2019, ArXiv.
[11] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[12] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[13] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[14] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[15] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[16] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[17] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[18] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[19] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[20] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[21] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[22] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[23] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[24] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[25] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.
[26] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[27] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[28] André da Motta Salles Barreto,et al. Policy Iteration Based on Stochastic Factorization , 2014, J. Artif. Intell. Res..
[29] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..
[30] Shimon Whiteson,et al. EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING , 2013, Comput. Intell..
[31] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[32] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[33] Qiang Liu,et al. Accountable Off-Policy Evaluation With Kernel Bellman Statistics , 2020, ICML.
[34] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[35] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.