Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval
暂无分享,去创建一个
Katja Hofmann | M. de Rijke | Shimon Whiteson | Maarten de Rijke | S. Whiteson | Katja Hofmann | Shimon Whiteson
[1] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[2] C. Watkins. Learning from delayed rewards , 1989 .
[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[4] Monika Henzinger,et al. Analysis of a very large web search engine query log , 1999, SIGF.
[5] Klaus Obermayer,et al. Support vector learning for ordinal regression , 1999 .
[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[7] Andrei Broder,et al. A taxonomy of web search , 2002, SIGF.
[8] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[9] Yi Zhang,et al. Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning , 2003, ICML.
[10] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[11] Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.
[14] Shimon Whiteson,et al. On-line evolutionary computation for reinforcement learning in stochastic domains , 2006, GECCO.
[15] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
[16] Filip Radlinski,et al. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.
[17] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[18] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[19] Tao Qin,et al. LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .
[20] Yi Zhang,et al. Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.
[21] Angela J. Yu,et al. the trade-off between exploitation and exploration Should I stay or should I go ? How the human brain manages , 2008 .
[22] Demosthenis Teneketzis,et al. Multi-Armed Bandit Problems , 2008 .
[23] Nick Craswell,et al. An experimental comparison of click position-bias models , 2008, WSDM '08.
[24] T. Minka. Selection bias in the LETOR datasets , 2008 .
[25] Stephen E. Robertson,et al. SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.
[26] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.
[27] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.
[28] Deepak Agarwal,et al. Online Models for Content Optimization , 2008, NIPS.
[29] Ram Akella,et al. A bayesian logistic regression model for active relevance feedback , 2008, SIGIR '08.
[30] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[31] Thorsten Joachims,et al. Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.
[32] D. Sculley,et al. Large Scale Learning to Rank , 2009 .
[33] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.
[34] Chao Liu,et al. Efficient multiple-click models in web search , 2009, WSDM '09.
[35] Christos Faloutsos,et al. Tailoring click models to user goals , 2009, WSCD '09.
[36] Jaime G. Carbonell,et al. Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve , 2009, ECIR.
[37] ChengXiang Zhai,et al. Evaluation of methods for relative comparison of retrieval systems based on clickthroughs , 2009, CIKM.
[38] ChengXiang Zhai,et al. Exploration-exploitation tradeoff in interactive relevance feedback , 2010, CIKM '10.
[39] Keith D. Kastella,et al. Foundations and Applications of Sensor Management , 2010 .
[40] Mark Sanderson,et al. Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..
[41] Peter Stone,et al. Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.
[42] Thorsten Joachims,et al. Fast Active Exploration for Link-Based Preference Learning Using Gaussian Processes , 2010, ECML/PKDD.
[43] Filip Radlinski,et al. Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.
[44] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[45] Katja Hofmann,et al. Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.
[46] Matthew Lease,et al. Active learning to maximize accuracy vs. effort in interactive information retrieval , 2011, SIGIR.
[47] Katja Hofmann,et al. A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.
[48] Tie-Yan Liu,et al. Learning to Rank for Information Retrieval , 2011 .
[49] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.