Self-Attentive Document Interaction Networks for Permutation Equivariant Ranking

How to leverage cross-document interactions to improve ranking performance is an important topic in information retrieval (IR) research. However, this topic has not been well-studied in the learning-to-rank setting and most of the existing work still treats each document independently while scoring. The recent development of deep learning shows strength in modeling complex relationships across sequences and sets. It thus motivates us to study how to leverage cross-document interactions for learning-to-rank in the deep learning framework. In this paper, we formally define the permutation-equivariance requirement for a scoring function that captures cross-document interactions. We then propose a self-attention based document interaction network and show that it satisfies the permutation-equivariant requirement, and can generate scores for document sets of varying sizes. Our proposed methods can automatically learn to capture document interactions without any auxiliary information, and can scale across large document sets. We conduct experiments on three ranking datasets: the benchmark Web30k, a Gmail search, and a Google Drive Quick Access dataset. Experimental results show that our proposed methods are both more effective and efficient than baselines.

[1]  W. Bruce Croft,et al.  Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[2]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[3]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[4]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5]  Yonatan Belinkov,et al.  Neural Attention for Learning to Rank Questions in Community Question Answering , 2016, COLING.

[6]  Sebastian Bruch,et al.  TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank , 2018, KDD.

[7]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[8]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[9]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[10]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[11]  Fernando Diaz,et al.  Regularizing query-based retrieval scores , 2007, Information Retrieval.

[12]  Tao Qin,et al.  Global Ranking of Documents Using Continuous Conditional Random Fields , 2008 .

[13]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[14]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[15]  Dan Pei,et al.  Personalized re-ranking for recommendation , 2019, RecSys.

[16]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[17]  Sebastian Bruch,et al.  Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks , 2019, SIGIR.

[18]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  W. Bruce Croft,et al.  A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[21]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[22]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[23]  Tao Qin,et al.  A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[24]  Sebastian Bruch,et al.  An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance , 2019, ICTIR.

[25]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[26]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[29]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[30]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[31]  Sebastian Bruch,et al.  Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks , 2018, ICTIR.

[32]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[33]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[34]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[35]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[36]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[37]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[38]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[39]  Elad Eban,et al.  Seq2Slate: Re-ranking and Slate Optimization with RNNs , 2018, ArXiv.

[40]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[41]  Sandeep Tata,et al.  Quick Access: Building a Smart Experience for Google Drive , 2017, KDD.

[42]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[43]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.