Modeling Document Interactions for Learning to Rank with Regularized Self-Attention

Learning to rank is an important task that has been successfully deployed in many real-world information retrieval systems. Most existing methods compute relevance judgments of documents independently, without holistically considering the entire set of competing documents. In this paper, we explore modeling documents interactions with self-attention based neural networks. Although self-attention networks have achieved state-of-the-art results in many NLP tasks, we find empirically that self-attention provides little benefit over baseline neural learning to rank architecture. To improve the learning of self-attention weights, We propose simple yet effective regularization terms designed to model interactions between documents. Evaluations on publicly available Learning to Rank (LETOR) datasets show that training self-attention network with our proposed regularization terms can significantly outperform existing learning to rank methods.

[1]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[2]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[3]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[8]  Ion Androutsopoulos,et al.  Deep Relevance Ranking Using Enhanced Document-Query Interactions , 2018, EMNLP.

[9]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[10]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  J. Friedman Stochastic gradient boosting , 2002 .

[12]  W. Bruce Croft,et al.  Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[15]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[16]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[17]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[18]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[19]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[20]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[21]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[22]  Qiang Wu,et al.  Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[27]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[28]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[29]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[30]  Salvatore Orlando,et al.  Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees , 2016, ACM Trans. Inf. Syst..

[31]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.