Type less, find more: fast autocompletion search with a succinct index

We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.

[1]  W. Bruce Croft,et al.  Indri at TREC 2004: Terabyte Track , 2004, TREC.

[2]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[3]  William R. Hersh,et al.  Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.

[4]  M. Jakobsson,et al.  Autocompletion in full text transaction entry: a method for humanized input , 1986, CHI '86.

[5]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[6]  Peter Haider,et al.  Learning to Complete Sentences , 2005, ECML.

[7]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[8]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[9]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[10]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[11]  Divesh Srivastava,et al.  Two-dimensional substring indexing , 2001, J. Comput. Syst. Sci..

[12]  Stephen Alstrup,et al.  New data structures for orthogonal range searching , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[13]  I. Witten,et al.  The Reactive Keyboard: a predictive typing aid , 1990, Computer.

[14]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[15]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[16]  Henry Lieberman,et al.  A commonsense approach to predictive text entry , 2004, CHI EA '04.

[17]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[18]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[19]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[20]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[21]  Charles L. A. Clarke,et al.  The TREC terabyte retrieval track , 2005, SIGF.

[22]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[23]  George Buchanan,et al.  Scalable browsing for large collections: a case study , 2000, DL '00.

[24]  Tobias Scheffer,et al.  Sentence Completion , 1921, SIGIR '04.

[25]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[26]  Jeffrey Scott Vitter,et al.  On two-dimensional indexability and optimal range search indexing , 1999, PODS '99.

[27]  Meng He,et al.  Indexing Compressed Text , 2003 .