暂无分享,去创建一个
M. de Rijke | Shimon Whiteson | Maarten de Rijke | Masrour Zoghi | Rémi Munos | R. Munos | M. Zoghi | Shimon Whiteson
[1] Christos Faloutsos,et al. Tailoring click models to user goals , 2009, WSCD '09.
[2] Johannes Fürnkranz,et al. Towards Preference-Based Reinforcement Learning , 2012 .
[3] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[4] Tao Qin,et al. LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .
[5] Csaba Szepesvári,et al. An adaptive algorithm for finite stochastic partial monitoring , 2012, ICML.
[6] M. de Rijke,et al. Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.
[7] Rémi Munos,et al. Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.
[8] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1951 .
[9] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[10] Nick Craswell,et al. An experimental comparison of click position-bias models , 2008, WSDM '08.
[11] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[12] Thorsten Joachims,et al. Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.
[13] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[14] Eyke Hüllermeier,et al. Preference Learning , 2005, Künstliche Intell..
[15] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[16] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[17] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[18] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[19] Chao Liu,et al. Efficient multiple-click models in web search , 2009, WSDM '09.
[20] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.
[21] Alexander J. Smola,et al. Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations , 2012, ICML.
[22] Fabrice Clérot,et al. Generic Exploration and K-armed Voting Bandits (extended version) , 2013 .
[23] Katja Hofmann,et al. Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .
[24] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[25] Thorsten Joachims,et al. Beat the Mean Bandit , 2011, ICML.
[26] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[27] Katja Hofmann,et al. Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.
[28] Raphaël Féraud,et al. Generic Exploration and K-armed Voting Bandits , 2013, ICML.
[29] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[30] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[31] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[32] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.
[33] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[34] Katja Hofmann,et al. A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.
[35] Robert D. Nowak,et al. Query Complexity of Derivative-Free Optimization , 2012, NIPS.
[36] Rémi Munos,et al. Stochastic Simultaneous Optimistic Optimization , 2013, ICML.
[37] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.