More Robust Doubly Robust Off-policy Evaluation
暂无分享,去创建一个
Mehrdad Farajtabar | Mohammad Ghavamzadeh | Yinlam Chow | M. Ghavamzadeh | Mehrdad Farajtabar | Yinlam Chow
[1] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[2] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[3] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[4] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[5] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[6] Zoran Popovic,et al. Offline Evaluation of Online Reinforcement Learning Algorithms , 2016, AAAI.
[7] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[8] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[9] C. Cassel,et al. Some results on generalized difference estimation and generalized regression estimation for finite populations , 1976 .
[10] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[13] Uri Shalit,et al. Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.
[14] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[15] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[16] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[17] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[18] J. Robins,et al. Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .
[19] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[20] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[21] John Langford,et al. Off-policy evaluation for slate recommendation , 2016, NIPS.
[22] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[23] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[24] Max Welling,et al. Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.
[25] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[26] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[27] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[28] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[29] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[30] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[31] M. Davidian,et al. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data , 2009, Biometrika.
[32] Peter Stone,et al. High Confidence Off-Policy Evaluation with Models , 2016, ArXiv.