Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation
暂无分享,去创建一个
[1] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[2] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[3] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[4] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.
[5] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[6] Meysam Bastani,et al. Model-Free Intelligent Diabetes Management Using Machine Learning , 2014 .
[7] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[8] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[9] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[10] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.
[11] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[12] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[13] Philip S. Thomas,et al. Importance Sampling with Unequal Support , 2016, AAAI.
[14] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[15] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[16] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.
[17] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..