Explore-exploit in top-N recommender systems via Gaussian processes

We address the challenge of ranking recommendation lists based on click feedback by efficiently encoding similarities among users and among items. The key challenges are threefold: (1) combinatorial number of lists; (2) sparse feedback and (3) context dependent recommendations. We propose the CGPRank algorithm, which exploits prior information specified in terms of a Gaussian process kernel function, which allows to share feedback in three ways: Between positions in a list, between items, and between contexts. Under our model, we provide strong performance guarantees and empirically evaluate our algorithm on data from two large scale recommendation tasks: Yahoo! news article recommendation, and Google books. In our experiments, CGPRank significantly outperforms state-of-the-art multi-armed bandit and learning-to-rank methods, with an 18% increase in clicks.

[1]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[2]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[3]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[4]  Bracha Shapira,et al.  Recommender Systems Handbook , 2010, Springer US.

[5]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[6]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[7]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[8]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[9]  Andreas Krause,et al.  Online Learning of Assignments , 2009, NIPS.

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[12]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[13]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[14]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[15]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[16]  Ronny Lempel Recommendation challenges in web media settings , 2012, RecSys '12.

[17]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[18]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[19]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[20]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[21]  Wei Chu,et al.  A case study of behavior-driven conjoint analysis on Yahoo!: front page today module , 2009, KDD.

[22]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[23]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[24]  Robert E. Schapire,et al.  Non-Stochastic Bandit Slate Problems , 2010, NIPS.