CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
暂无分享,去创建一个
Yi Su | Thorsten Joachims | Lequn Wang | Michele Santacatterina | T. Joachims | Yi-Hsun Su | Michele Santacatterina | Lequn Wang
[1] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[2] John Langford,et al. The offset tree for learning with partial labels , 2008, KDD.
[3] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[4] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[5] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[6] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[7] David E. Booth,et al. Analysis of Incomplete Multivariate Data , 2000, Technometrics.
[8] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[9] Thorsten Joachims,et al. Intervention Harvesting for Context-Dependent Examination-Bias Estimation , 2018, SIGIR.
[10] J. Robins,et al. Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .
[11] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .
[12] Thorsten Joachims,et al. Estimating Position Bias without Intrusive Interventions , 2018, WSDM.
[13] Marc Najork,et al. Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.
[14] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[15] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[16] D. Rubin,et al. Multiple Imputation for Nonresponse in Surveys , 1989 .
[17] Joseph Kang,et al. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.
[18] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[19] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[20] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.
[21] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[22] Thorsten Joachims,et al. Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.
[23] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.
[24] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[25] Nicole A. Lazar,et al. Statistical Analysis With Missing Data , 2003, Technometrics.
[26] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.