暂无分享,去创建一个
Yuta Saito | Yusuke Narita | Megumi Matsutani | Shunsuke Aihara | Yuta Saito | Yusuke Narita | Shunsuke Aihara | Megumi Matsutani
[1] Masatoshi Uehara,et al. Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning , 2019, NeurIPS.
[2] Thomas Nedelec,et al. Offline A/B Testing for Recommender Systems , 2018, WSDM.
[3] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[4] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[5] Joseph Kang,et al. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.
[6] Wei Chu,et al. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .
[7] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] John Langford,et al. The offset tree for learning with partial labels , 2008, KDD.
[9] Rajeev Rastogi,et al. LogUCB: an explore-exploit algorithm for comments recommendation , 2012, CIKM '12.
[10] M. de Rijke,et al. Large-scale Validation of Counterfactual Learning Methods: A Test-Bed , 2016, ArXiv.
[11] Sergey Levine,et al. Off-Policy Evaluation via Off-Policy Classification , 2019, NeurIPS.
[12] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[13] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[14] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[15] Yisong Yue,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, ArXiv.
[16] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[17] Shota Yasui,et al. Safe Counterfactual Reinforcement Learning , 2020, ArXiv.
[18] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[19] John Langford,et al. Off-policy evaluation for slate recommendation , 2016, NIPS.
[20] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[21] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[22] Ben Carterette,et al. Offline Evaluation to Make Decisions About PlaylistRecommendation Algorithms , 2019, WSDM.
[23] Jure Leskovec,et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.
[24] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[25] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[26] Tie-Yan Liu,et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.
[27] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[28] Masahiro Kato,et al. Off-Policy Evaluation and Learning for External Validity under a Covariate Shift , 2020, NeurIPS.
[29] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[30] Shota Yasui,et al. Off-policy Bandit and Reinforcement Learning , 2020 .
[31] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[32] Shota Yasui,et al. Efficient Counterfactual Learning from Bandit Feedback , 2018, AAAI.
[33] Nikos Vlassis,et al. On the Design of Estimators for Bandit Off-Policy Evaluation , 2019, ICML.
[34] Yoshua Bengio,et al. Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..
[35] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[36] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[37] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.
[38] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[41] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..