Passage retrieval revisited

Ranking based on passages addresses some of the shortcomings ofwhole-document ranking. It provides convenient units of text toreturn to the user, avoids the difficulties of comparing documentsof different length, and enables identification of short blocks ofrelevant material amongst otherwise irrelevant text. In this paperwe explore the potential of passage retrieval, based on anexperimental evaluation of the ability of passages to identifyrelevant documents. We compare our scheme of arbitrary passageretrieval to several other document retrieval and passage retrievalmethods; we show experimentally that, compared to these methods,ranking via fixed-length passages is robust and effective. Ourexperiments also show that, compared to whole-document ranking,ranking via fixed-length arbitrary passages significantly improvesretrieval effectiveness, by 8% for TREC disks 2 and 4 and by18%-37% for the Federal Register collection.

[1]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[2]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[3]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[4]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[5]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[6]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[7]  Alistair Moffat,et al.  Efficient Retrieval of Partial Documents , 1995, Inf. Process. Manag..

[8]  Ross Wilkinson,et al.  The MDS Experiments for TREC5 , 1996, TREC.

[9]  Peter Schäuble,et al.  Document and passage retrieval based on hidden Markov models , 1994, SIGIR '94.

[10]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[11]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[12]  Ron Sacks-Davis,et al.  Filtered document retrieval with frequency-sorted indexes , 1996 .

[13]  Hugh E. Williams,et al.  Indexing Nucleotide Databases for Fast Query Evaluation , 1996, EDBT.

[14]  Peter Schäuble,et al.  Highlighting Relevant Passages for Users of the Interactive SPIDER Retrieval System , 1995, TREC.

[15]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[16]  Ian H. Witten,et al.  The MG retrieval system: compressing for space and speed , 1995, CACM.