What's Next? Index Structures for Efficient Phrase Querying

Text retrieval systems are used to fetch documents from large text collections, using queries consisting of words and word sequences. A shortcoming of current systems is that word-sequence queries, also known as phrase queries, can be expensive to evaluate, particularly if they include common words. Another limitation is that some forms of querying are not supported; an example is phrase completion, which provides an alternative way of locating information. We propose a new index structure, a nextword index, that addresses both of these problems. We show experimentally that nextword indexes can be used for rapid phrase querying, and show that they allow practical phrase completion.

[1]  Ian H. Witten,et al.  The MG retrieval system: compressing for space and speed , 1995, CACM.

[2]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[3]  Alistair Moffat,et al.  Text Compression for Dynamic Document Databases , 1997, IEEE Trans. Knowl. Data Eng..

[4]  Alistair Moffat,et al.  Economical Inversion of Large Text Files , 1992, Comput. Syst..

[5]  S. Golomb Run-length encodings. , 1966 .

[6]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[7]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[8]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[9]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[10]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[11]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[12]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[13]  Ian H. Witten,et al.  Browsing in digital libraries: a phrase-based approach , 1997, DL '97.

[14]  Maristella Agosti,et al.  Information Retrieval and Hypertext , 1996, Information Retrieval and Hypertext.

[15]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[16]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[17]  Peter Bruza,et al.  Searching the World Wide Web made easy? the cognitive load imposed by query refinement mechanisms , 1998 .

[18]  James A. Thom,et al.  Indexing Documents for Queries on Structure, Content and Attributes , 1997 .