Online Learning to Rank with Features

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter. Only relatively mild assumptions are made on the examination function. A novel algorithm for this setup is analysed, showing that the dependence on the number of items is replaced by a dependence on the dimension, allowing the new algorithm to handle a large number of items. When reduced to the orthogonal case, the regret of the algorithm improves on the state-of-the-art.

[1]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[2]  Sougata Chaudhuri,et al.  Learning to Rank: Online Learning, Statistical Theory and Applications. , 2016 .

[3]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[4]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[5]  Elad Hazan,et al.  Volumetric Spanners: An Efficient Exploration Basis for Learning , 2013, J. Mach. Learn. Res..

[6]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[7]  Alexandre Proutière,et al.  Learning to Rank , 2015, SIGMETRICS.

[8]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[9]  J. Kiefer,et al.  The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.

[10]  Sham M. Kakade,et al.  Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[11]  Katja Hofmann,et al.  Online Learning to Rank: Absolute vs. Relative , 2015, WWW.

[12]  Robert D. Nowak,et al.  Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Zheng Wen,et al.  Bernoulli Rank-1 Bandits for Click Feedback , 2017, IJCAI.

[15]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[16]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[17]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[18]  Shuai Li,et al.  Contextual Dependent Click Bandit Algorithm for Web Recommendation , 2018, COCOON.

[19]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[20]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[21]  Alessandro Lazaric,et al.  Best-Arm Identification in Linear Bandits , 2014, NIPS.

[22]  Shuai Li,et al.  Online Clustering of Contextual Cascading Bandits , 2017, AAAI.

[23]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[24]  Zheng Wen,et al.  Stochastic Rank-1 Bandits , 2016, AISTATS.

[25]  Philip M. Long,et al.  Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.

[26]  Masashi Sugiyama,et al.  Fully adaptive algorithm for pure exploration in linear bandits , 2017, 1710.05552.

[27]  Michael J. Todd,et al.  Minimum volume ellipsoids - theory and algorithms , 2016, MOS-SIAM Series on Optimization.

[28]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[29]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[30]  Olivier Cappé,et al.  Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[31]  Shuai Li,et al.  TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.

[32]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[33]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[34]  Rémi Munos,et al.  Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[35]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[36]  A. Rustichini Minimizing Regret : The General Case , 1999 .