论文信息 - Online Learning to Rank with Features - 字舞流文

Online Learning to Rank with Features

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter. Only relatively mild assumptions are made on the examination function. A novel algorithm for this setup is analysed, showing that the dependence on the number of items is replaced by a dependence on the dimension, allowing the new algorithm to handle a large number of items. When reduced to the orthogonal case, the regret of the algorithm improves on the state-of-the-art.

Shuai Li | Tor Lattimore | Csaba Szepesvári | Csaba Szepesvari | Tor Lattimore | Shuai Li

[1] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[2] Sougata Chaudhuri,et al. Learning to Rank: Online Learning, Statistical Theory and Applications. , 2016 .

[3] Katja Hofmann,et al. A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[4] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[5] Elad Hazan,et al. Volumetric Spanners: An Efficient Exploration Basis for Learning , 2013, J. Mach. Learn. Res..

[6] Filip Radlinski,et al. Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[7] Alexandre Proutière,et al. Learning to Rank , 2015, SIGMETRICS.

[8] Csaba Szepesvári,et al. Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[9] J. Kiefer,et al. The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.

[10] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[11] Katja Hofmann,et al. Online Learning to Rank: Absolute vs. Relative , 2015, WWW.

[12] Robert D. Nowak,et al. Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.

[13] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14] Zheng Wen,et al. Bernoulli Rank-1 Bandits for Click Feedback , 2017, IJCAI.

[15] Zheng Wen,et al. DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[16] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[17] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[18] Shuai Li,et al. Contextual Dependent Click Bandit Algorithm for Web Recommendation , 2018, COCOON.

[19] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.

[20] Shuai Li,et al. Contextual Combinatorial Cascading Bandits , 2016, ICML.

[21] Alessandro Lazaric,et al. Best-Arm Identification in Linear Bandits , 2014, NIPS.

[22] Shuai Li,et al. Online Clustering of Contextual Cascading Bandits , 2017, AAAI.

[23] Zheng Wen,et al. Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[24] Zheng Wen,et al. Stochastic Rank-1 Bandits , 2016, AISTATS.

[25] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.

[26] Masashi Sugiyama,et al. Fully adaptive algorithm for pure exploration in linear bandits , 2017, 1710.05552.

[27] Michael J. Todd,et al. Minimum volume ellipsoids - theory and algorithms , 2016, MOS-SIAM Series on Optimization.

[28] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[29] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[30] Olivier Cappé,et al. Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[31] Shuai Li,et al. TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.

[32] M. de Rijke,et al. Click Models for Web Search , 2015, Click Models for Web Search.

[33] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[34] Rémi Munos,et al. Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[35] Marc Najork,et al. Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[36] A. Rustichini. Minimizing Regret : The General Case , 1999 .