论文信息 - Learning to Rank in the Position Based Model with Bandit Feedback - 字舞流文

Learning to Rank in the Position Based Model with Bandit Feedback

Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextual multi-armed bandits. We propose novel extensions of two well-known algorithms viz. LinUCB and Linear Thompson Sampling to the ranking use-case. To account for the biases in a production environment, we employ the position-based click model. Finally, we show the validity of the proposed algorithms by conducting extensive offline experiments on synthetic datasets as well as customer facing online A/B experiments.

Patrick Ernst | Beyza Ermis | Giovanni Zappella | Yannik Stein | P. Ernst | Giovanni Zappella | Beyza Ermis | Yannik Stein

[1] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[2] Zheng Wen,et al. Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[3] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.

[4] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[5] M. de Rijke,et al. Click Models for Web Search , 2015, Click Models for Web Search.

[6] Shuai Li,et al. On Context-Dependent Clustering of Bandits , 2016, ICML.

[7] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.

[8] Branislav Kveton,et al. Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.

[9] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[10] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[11] Olivier Cappé,et al. Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[12] Shuai Li,et al. TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.

[13] S. Muthukrishnan,et al. Offline Evaluation of Ranking Policies with Click Models , 2018, KDD.

[14] Ben Carterette,et al. Offline Evaluation to Make Decisions About PlaylistRecommendation Algorithms , 2019, WSDM.

[15] Fernando Diaz,et al. Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems , 2018, CIKM.

[16] J. Sherman,et al. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[17] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[18] Matthew Richardson,et al. Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[19] Mounia Lalmas,et al. Deriving User- and Content-specific Rewards for Contextual Bandits , 2019, WWW.

[20] M. de Rijke,et al. An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[21] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[22] Antoine Chambaz,et al. Asymptotically optimal algorithms for budgeted multiple play bandits , 2016, Machine Learning.

[23] Thorsten Joachims,et al. Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[24] Antonino Freno,et al. Practical Lessons from Developing a Large-Scale Recommender System at Zalando , 2017, RecSys.

[25] Nick Craswell,et al. An experimental comparison of click position-bias models , 2008, WSDM '08.

[26] Akiko Takeda,et al. Position-based Multiple-play Bandit Problem with Unknown Position Bias , 2017, NIPS.

[27] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.

[28] Marc Najork,et al. Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.