Deep Learning with Logged Bandit Feedback
暂无分享,去创建一个
M. de Rijke | Maarten de Rijke | Thorsten Joachims | Adith Swaminathan | T. Joachims | Adith Swaminathan
[1] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[2] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.
[5] W. Bruce Croft,et al. Neural Ranking Models with Weak Supervision , 2017, SIGIR.
[6] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[7] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[8] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[9] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[10] John Langford,et al. The offset tree for learning with partial labels , 2008, KDD.
[11] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[12] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[13] D. Rubin,et al. The central role of the propensity score in observational studies for causal effects , 1983 .
[14] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[17] T. Hesterberg,et al. Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .