暂无分享,去创建一个
[1] M. J. D. Powell,et al. Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance , 1966 .
[2] Pranab Kumar Sen,et al. Large Sample Methods in Statistics: An Introduction with Applications , 1993 .
[3] J. Robins,et al. Semiparametric regression estimation in the presence of dependent censoring , 1995 .
[4] R. Bartle. The elements of integration and Lebesgue measure , 1995 .
[5] R. Mittelhammer. Mathematical Statistics for Economics and Business , 1996 .
[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Debashis Kushary,et al. Bootstrap Methods and Their Application , 2000, Technometrics.
[9] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[10] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[13] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[14] Devinder Thapa,et al. Agent Based Decision Support System Using Reinforcement Learning Under Emergency Circumstances , 2005, ICNC.
[15] Michael H. Bowling,et al. Optimal Unbiased Estimators for Evaluating Agent Performance , 2006, AAAI.
[16] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[17] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[18] Martha White,et al. Learning a Value Analysis Tool for Agent Evaluation , 2009, IJCAI.
[19] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[20] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[21] Martha White,et al. A general framework for reducing variance in agent evaluation , 2010 .
[22] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[23] P. Thomas,et al. TD γ : Re-evaluating Complex Backups in Temporal Difference Learning , 2011 .
[24] Joel Veness,et al. Variance Reduction in Monte-Carlo Tree Search , 2011, NIPS.
[25] Scott Niekum,et al. TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning , 2011, NIPS.
[26] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[27] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[28] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[29] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[30] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[31] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.
[32] Philip S. Thomas,et al. A Notation for Markov Decision Processes , 2015, ArXiv.
[33] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[34] Lihong Li,et al. Doubly Robust Off-policy Evaluation for Reinforcement Learning , 2015, ArXiv.
[35] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[36] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[37] Zoran Popovic,et al. Offline Evaluation of Online Reinforcement Learning Algorithms , 2016, AAAI.
[38] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[39] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .