Enhancing relevance scoring with chronological term rank

We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of words in a document. CTR is both conceptually and computationally simple when compared to other approaches that use document structure information, such as term proximity, term order and document features. CTR works well when paired with Okapi BM25. We evaluate the performance of various combinations of CTR with Okapi BM25 in order to identify the most effective formula. We then compare the performance of the selected approach against the performance of existing methods such as Okapi BM25, pivoted length normalization and language models. Significant improvements are seen consistently across a variety of TREC data and topic sets, measured by the major retrieval performance metrics. This seems to be the first use of this statistic for relevance scoring. There is likely to be greater retrieval improvements possible using chronological term rank enhanced methods in future work.

[1]  Alistair Moffat,et al.  Simplified similarity scoring using term ranks , 2005, SIGIR '05.

[2]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[3]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[4]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[5]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.

[6]  David Hawking,et al.  Relevance weighting using distance between term occurrences , 1996 .

[7]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[8]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[9]  Alistair Moffat,et al.  Impact transformation: effective and efficient web retrieval , 2002, SIGIR '02.

[10]  Michel Beigbeder,et al.  An information retrieval model using the fuzzy proximity degree of term occurences , 2005, SAC '05.

[11]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[12]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[13]  E. Michael Keen,et al.  Term position ranking: some new test results , 1992, SIGIR '92.

[14]  Rong Jin,et al.  Title language model for information retrieval , 2002, SIGIR '02.

[15]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.