THUIR at the NTCIR-14 WWW-2 Task

The THUIR team participated in both Chinese and English subtasks of the NTCIR-14 We Want Web-2 (WWW-2) task. This paper describes our approaches and results in the WWW-2 task. In the Chinese subtask, we designed and trained two neural ranking models on the Sogou-QCL dataset. In the English subtask, we adopted learning to rank models by training them on MQ2007 and MQ2008 datasets. Our methods achieved the best performances in both Chinese and English subtasks. Through further analysis of results, we find that our neural models can achieve better performances in all navigational, informational and transactional queries in Chinese subtask. In the English subtask, the learning-to-rank methods have stronger modeling capabilities than BM25 by learning from effective hand-crafted features.

[1]  W. Bruce Croft,et al.  User oriented tweet ranking: a filtering approach to microblogs , 2011, CIKM '11.

[2]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[3]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[6]  Yiqun Liu,et al.  Training Deep Ranking Model with Weak Relevance Labels , 2017, ADC.

[7]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[8]  Zhiyuan Liu,et al.  Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval , 2018, ACL.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Jun Xu,et al.  Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[12]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[13]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[14]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[15]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[16]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[17]  Yiqun Liu,et al.  Sogou-QCL: A New Dataset with Click Relevance Label , 2018, SIGIR.

[18]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[19]  W. Bruce Croft,et al.  A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[20]  Tao Qin,et al.  A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[21]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[22]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[23]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[25]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[26]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[27]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[28]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[29]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[30]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[31]  Berkant Barla Cambazoglu,et al.  Scalability Challenges in Web Search Engines , 2015, Scalability Challenges in Web Search Engines.

[32]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[33]  Yiqun Liu,et al.  Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation , 2019, SIGIR.

[34]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[35]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[36]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.