暂无分享,去创建一个
Yu Bai | Yu-Xiang Wang | Ming Yin | Yu-Xiang Wang | Yu Bai | Ming Yin
[1] Yu-Xiang Wang,et al. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.
[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[3] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[4] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[5] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[6] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[7] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[8] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[9] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[10] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[11] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[12] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[13] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[14] M. J. D. Powell,et al. Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance , 1966 .
[15] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[16] Sergey Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.
[17] Peter Szolovits,et al. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.
[18] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[19] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[20] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes , 2019, ArXiv.
[21] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[22] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[23] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[24] J. Tropp. FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.
[25] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[26] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[27] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[28] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[29] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[30] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[31] Marco Corazza,et al. Testing different Reinforcement Learning con?gurations for ?nancial trading: Introduction and applications , 2018 .
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[34] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[35] Sergey Levine,et al. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[36] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[37] Masatoshi Uehara,et al. Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning , 2019, NeurIPS.
[38] Fan Chung Graham,et al. Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..
[39] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[40] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[41] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[42] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[43] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[44] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[45] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[46] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[47] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[48] Nan Jiang,et al. Batch Value-function Approximation with Only Realizability , 2020, ICML.
[49] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[50] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[51] Yao Liu,et al. Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters , 2018, ArXiv.
[52] Lin F. Yang,et al. On the Optimality of Sparse Model-Based Planning for Markov Decision Processes , 2019, ArXiv.
[53] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[54] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[55] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[56] Ambuj Tewari,et al. Reinforcement learning in large or unknown mdps , 2007 .
[57] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[58] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[59] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..