Off-policy evaluation for slate recommendation
暂无分享,去创建一个
John Langford | Akshay Krishnamurthy | Miroslav Dudík | Imed Zitouni | Adith Swaminathan | Alekh Agarwal | Damien Jose | J. Langford | A. Krishnamurthy | Alekh Agarwal | Miroslav Dudík | Adith Swaminathan | I. Zitouni | Damien Jose
[1] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[2] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[3] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[4] Bruce Hendrickson,et al. Support Theory for Preconditioning , 2003, SIAM J. Matrix Anal. Appl..
[5] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.
[6] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[7] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[8] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[9] Benjamin Piwowarski,et al. A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.
[10] Ron Kohavi,et al. Responsible editor: R. Bayardo. , 2022 .
[11] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[12] Olivier Chapelle,et al. A dynamic bayesian network click model for web search ranking , 2009, WWW '09.
[13] Olivier Chapelle,et al. Expected reciprocal rank for graded relevance , 2009, CIKM.
[14] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[15] Chao Liu,et al. Click chain model in web search , 2009, WWW '09.
[16] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[17] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[18] Robert E. Schapire,et al. Non-Stochastic Bandit Slate Problems , 2010, NIPS.
[19] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[20] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[21] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[22] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[23] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[24] Jimmy J. Lin,et al. Training Efficient Tree-Based Models for Document Ranking , 2013, ECIR.
[25] Tao Qin,et al. Introducing LETOR 4.0 Datasets , 2013, ArXiv.
[26] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[27] Xiaoyan Zhu,et al. Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.
[28] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[29] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .
[30] Djoerd Hiemstra,et al. A cross-benchmark comparison of 87 learning to rank methods , 2015, Inf. Process. Manag..
[31] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[32] Akshay Krishnamurthy,et al. Efficient Contextual Semi-Bandit Learning , 2015, ArXiv.
[33] Lihong Li,et al. Toward Predicting the Outcome of an A/B Experiment for Search Relevance , 2015, WSDM.
[34] Yue Wang,et al. Beyond Ranking: Optimizing Whole-Page Presentation , 2016, WSDM.
[35] Filip Radlinski,et al. Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..
[36] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.