Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
暂无分享,去创建一个
[1] B. Efron. Bootstrap Methods: Another Look at the Jackknife , 1979 .
[2] B. Efron. Better Bootstrap Confidence Intervals , 1987 .
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[5] J Carpenter,et al. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.
[6] A. Folsom,et al. Coronary heart disease risk prediction in the Atherosclerosis Risk in Communities (ARIC) study. , 2003, Journal of clinical epidemiology.
[7] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[10] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[11] Martha White,et al. Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains , 2010, NIPS.
[12] Peter Stone,et al. Real time targeted exploration in large domains , 2010, 2010 IEEE 9th International Conference on Development and Learning.
[13] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[14] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[15] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[16] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.
[17] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[18] M. Ghavamzadeh,et al. Robust Policy Optimization with Baseline Guarantees , 2015, 1506.04514.
[19] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[20] Lihong Li,et al. Doubly Robust Off-policy Evaluation for Reinforcement Learning , 2015, ArXiv.
[21] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[22] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[23] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.