论文信息 - Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment - 字舞流文

Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment

The understanding of the process of relevance judgment helps to inspire the design of retrieval models. Traditional retrieval models usually estimate relevance based on document-level signals. Recent works consider a more fine-grain, passage-level relevance information, which can further enhance retrieval performance. However, it lacks a detailed analysis of how passage-level relevance signals determine or influence the relevance judgment of the whole document. To investigate the role of passage-level relevance in the document-level relevance judgment, we construct an ad-hoc retrieval dataset with both passage-level and document-level relevance labels. A thorough analysis reveals that: 1) there is a strong correlation between the document-level relevance and the fractions of irrelevant passages to highly relevant passages; 2) the position, length and query similarity of passages play different roles in the determination of document-level relevance; 3) The sequential passage-level relevance within a document is a potential indicator for the document-level relevance. Based on the relationship between passage-level and document-level relevance, we also show that utilizing passage-level relevance signals can improve existing document ranking models. This study helps us better understand how users perceive relevance for a document and inspire the designing of novel ranking models leveraging fine-grain, passage-level relevance signals.

Yiqun Liu | Shaoping Ma | Min Zhang | Jiaxin Mao | Zhijing Wu | Yiqun Liu | Shaoping Ma | Min Zhang | Jiaxin Mao | Zhijing Wu

[1] Yelong Shen,et al. Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[2] Alexander Kotov,et al. Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data , 2015, SIGIR.

[3] James Allan,et al. Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[4] Yiqun Liu,et al. Understanding Reading Attention Distribution during Relevance Judgement , 2018, CIKM.

[5] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[6] Jun Xu,et al. Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[7] Matthew Lease,et al. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments , 2016, HCOMP.

[8] W. Bruce Croft,et al. Passage retrieval based on language models , 2002, CIKM '02.

[9] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[10] W. Bruce Croft,et al. A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[11] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12] Christopher S. G. Khoo,et al. Incorporating window-based passage-level evidence in document retrieval , 2001, J. Inf. Sci..

[13] Marti A. Hearst. TextTiling: A Quantitative Approach to Discourse , 1993 .

[14] Christopher J. C. Burges,et al. From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[15] Wai Lam,et al. PASSAGE-BASED RETRIEVAL USING PARAMETERIZED FUZZY SET OPERATORS , 2004 .

[16] Kam-Fai Wong,et al. A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[17] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[18] Matthew Lease,et al. The Many Benefits of Annotator Rationales for Relevance Judgments , 2017, IJCAI.

[19] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[20] Ross Wilkinson,et al. Effective retrieval of structured documents , 1994, SIGIR '94.

[21] Lora Aroyo,et al. Studying Topical Relevance with Evidence-based Crowdsourcing , 2018, CIKM.

[22] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[23] Hang Li,et al. AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[24] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[25] Ryen W. White,et al. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.

[26] Kam-Fai Wong,et al. Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[27] Nicholas J. Belkin,et al. A faceted approach to conceptualizing tasks in information seeking , 2008, Inf. Process. Manag..

[28] Xueqi Cheng,et al. DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[29] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[30] James P. Callan,et al. Passage-level evidence in document retrieval , 1994, SIGIR '94.

[31] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[32] Justin Zobel,et al. Passage retrieval revisited , 1997, SIGIR '97.

[33] Justin Zobel,et al. Effective ranking with arbitrary passages , 2001 .

[34] A. Trotman. Can we at least agree on something ? , 2007 .

[35] Luo Si,et al. Discriminative probabilistic models for passage based retrieval , 2008, SIGIR '08.

[36] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[38] Fredric C. Gey,et al. Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[39] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[40] Christian Plaunt,et al. Subtopic structuring for full-length document access , 1993, SIGIR.