Vector-space ranking with effective early termination

Considerable research effort has been invested in improving the effectiveness of information retrieval systems. Techniques such as relevance feedback, thesaural expansion, and pivoting all provide better quality responses to queries when tested in standard evaluation frameworks. But such enhancements can add to the cost of evaluating queries. In this paper we consider the pragmatic issue of how to improve the cost-effectiveness of searching. We describe a new inverted file structure using quantized weights that provides superior retrieval effectiveness compared to conventional inverted file structures when early termination heuristics are employed. That is, we are able to reach similar effectiveness levels with less computational cost, and so provide a better cost/performance compromise than previous inverted file organisations.

[1]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[2]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[3]  Charles L. A. Clarke,et al.  Efficient construction of large test collections , 1998, SIGIR '98.

[4]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[5]  Dik Lun Lee,et al.  Implementations of Partial Document Ranking Using Inverted Files , 1993, Information Processing & Management.

[6]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[7]  Alistair Moffat,et al.  Memory Efficient Ranking , 1994, Inf. Process. Manag..

[8]  Donna Harman,et al.  Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking. , 1990 .

[9]  Stephen E. Robertson,et al.  Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC , 1995, Inf. Process. Manag..

[10]  Michael Persin,et al.  Document filtering for fast ranking , 1994, SIGIR '94.

[11]  Alistair Moffat,et al.  Compressed inverted files with reduced decoding overheads , 1998, SIGIR '98.

[12]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[13]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[14]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[15]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[16]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[17]  Dario Lucarella,et al.  A document retrieval system based on nearest neighbour searching , 1988, J. Inf. Sci..

[18]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[19]  Eric W. Brown,et al.  Fast evaluation of structured queries for information retrieval , 1995, SIGIR '95.

[20]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[21]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..