Training Deep Ranking Model with Weak Relevance Labels

Deep neural networks have already achieved great success in a number of fields, for example, computer vision, natural language processing, speech recognition, and etc. However, such advances have not been observed in information retrieval (IR) tasks yet, such as ad-hoc retrieval. A potential explanation is that in a particular IR task, training a document ranker usually needs large amounts of relevance labels which describe the relationship between queries and documents. However, this kind of relevance judgments are usually very expensive to obtain. In this paper, we propose to train deep ranking models with weak relevance labels generated by click model based on real users’ click behavior. We investigate the effectiveness of different weak relevance labels trained based on several major click models, such as DBN, RCM, PSCM, TCM, and UBM. The experimental results indicate that the ranking models trained with weak relevance labels are able to utilize large scale of behavior data and they can get similar performance compared to the ranking model trained based on relevance labels from external assessors, which are supposed to be more accurate. This preliminary finding encourages us to develop deep ranking models with weak supervised data.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[3]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[4]  Yiqun Liu,et al.  Incorporating Non-sequential Behavior into Click Models , 2015, SIGIR.

[5]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[6]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[7]  Yuchen Zhang,et al.  User-click modeling for understanding and predicting search-behavior , 2011, KDD.

[8]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[9]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[10]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[11]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[12]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[13]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[14]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[15]  Hongbo Deng,et al.  Ranking Relevance in Yahoo Search , 2016, KDD.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Yiqun Liu,et al.  SogouT-16: A New Web Corpus to Embrace IR Research , 2017, SIGIR.

[18]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[19]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[20]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[21]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[22]  Gerard de Melo,et al.  A Position-Aware Deep Model for Relevance Matching in Information Retrieval , 2017, ArXiv.

[23]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[24]  Yiqun Liu,et al.  Incorporating vertical results into search click models , 2013, SIGIR.

[25]  Erick Cantú-Paz,et al.  Temporal click model for sponsored search , 2010, SIGIR.

[26]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[27]  Bhaskar Mitra,et al.  Neural Models for Information Retrieval , 2017, ArXiv.