Marginalized Off-Policy Evaluation for Reinforcement Learning
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[2] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[3] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[4] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[5] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[6] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[7] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[9] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[10] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[11] Eugen Slutsuky,et al. Über stochastische Asymptoten und Grenzwerte (Abdruck) , 2022 .
[12] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[13] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[14] Rómer Rosales,et al. Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[17] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[18] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.
[19] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[20] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[21] Peter Szolovits,et al. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.
[22] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.