Online Learning to Rank with List-level Feedback for Image Filtering

Online learning to rank (OLTR) via implicit feedback has been extensively studied for document retrieval in cases where the feedback is available at the level of individual items. To learn from item-level feedback, the current algorithms require certain assumptions about user behavior. In this paper, we study a more general setup: OLTR with list-level feedback, where the feedback is provided only at the level of an entire ranked list. We propose two methods that allow online learning to rank in this setup. The first method, PGLearn, uses a ranking model to generate policies and optimizes it online using policy gradients. The second method, RegLearn, learns to combine individual document relevance scores by directly predicting the observed list-level feedback through regression. We evaluate the proposed methods on the image filtering task, in which deep neural networks (DNNs) are used to rank images in response to a set of standing queries. We show that PGLearn does not perform well in OLTR with list-level feedback. RegLearn, instead, shows good performance in both online and offline metrics.

[1]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[2]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[3]  M. de Rijke,et al.  Click-based Hot Fixes for Underperforming Torso Queries , 2016, SIGIR.

[4]  Imed Zitouni,et al.  Predicting User Satisfaction with Intelligent Assistants , 2016, SIGIR.

[5]  Vidit Jain,et al.  Learning to re-rank: query-dependent image re-ranking using click data , 2011, WWW.

[6]  M. de Rijke,et al.  Modeling Label Ambiguity for Neural List-Wise Learning to Rank , 2017, ArXiv.

[7]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[10]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[11]  Rossano Schifanella,et al.  Leveraging User Interaction Signals for Web Image Search , 2016, SIGIR.

[12]  Robert B. Fisher,et al.  A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage , 2014, Ecol. Informatics.

[13]  Yiqun Liu,et al.  A Large-Scale Study of Mobile Search Examination Behavior , 2018, SIGIR.

[14]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[15]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[16]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Meng Wang,et al.  Investigating Examination Behavior of Image Search Users , 2017, SIGIR.

[20]  Shaoping Ma,et al.  Constructing an Interaction Behavior Model for Web Image Search , 2018, SIGIR.

[21]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[22]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[23]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Rerank , 2018, ArXiv.

[24]  R. J. Henery,et al.  Permutation Probabilities as Models for Horse Races , 1981 .

[25]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[26]  Madian Khabsa,et al.  Detecting Good Abandonment in Mobile Search , 2016, WWW.

[27]  Yiqun Liu,et al.  Why People Search for Images using Web Search Engines , 2017, WSDM.

[28]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[29]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[30]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Huazheng Wang,et al.  Efficient Exploration of Gradient Space for Online Learning to Rank , 2018, SIGIR.

[33]  Julio Gonzalo,et al.  Overview of RepLab 2014: Author Profiling and Reputation Dimensions for Online Reputation Management , 2014, CLEF.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[36]  Maarten de Rijke,et al.  Incremental Sparse Bayesian Ordinal Regression , 2018, Neural Networks.

[37]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[38]  Christos Faloutsos,et al.  Information filtering and retrieval: overview, issues and directions , 1994, Proceedings of 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.