Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
暂无分享,去创建一个
[1] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[2] M. Chao,et al. Negative Moments of Positive Random Variables , 1972 .
[3] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[4] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[5] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[6] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[7] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[8] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[11] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[12] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[13] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[14] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[15] Rómer Rosales,et al. Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..
[16] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[17] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[18] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[19] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[20] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[21] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[22] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[23] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.
[24] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[25] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[26] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[27] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[28] Peter Szolovits,et al. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.
[29] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.
[30] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[31] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[32] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[33] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[34] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[35] Yao Liu,et al. Combining Parametric and Nonparametric Models for Off-Policy Evaluation , 2019, ICML.