Partition Based Hierarchical Index for Text Retrieval

Along with single word query, phrase query is frequently used in digital library. This paper proposes a new partition based hierarchical index structure for efficient phrase query and a parallel algorithm based on the index structure. In this scheme, a document is divided into several elements. The elements are distributed on several processors. In each processor, a hierarchical inverted index is built, by which single word and phrase queries can be answered efficiently. This index structure and the partition make the postings lists shorter. At the same time, integer compression technique is used more efficiently. Experiments and analysis show that query evaluation time is significantly reduced.

[1]  Ricardo A. Baeza-Yates,et al.  Adding Compression to Block Addressing Inverted Indexes , 2000, Information Retrieval.

[2]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[3]  Hugh E. Williams,et al.  Efficient phrase querying with an auxiliary index , 2002, SIGIR '02.

[4]  George Buchanan,et al.  Scalable browsing for large collections: a case study , 2000, DL '00.

[5]  Hugh E. Williams,et al.  Optimised phrase querying and browsing of large text databases , 2001, Proceedings 24th Australian Computer Science Conference. ACSC 2001.

[6]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[7]  Berthier A. Ribeiro-Neto,et al.  Efficient distributed algorithms to build inverted files , 1999, SIGIR '99.

[8]  Edward A. Fox,et al.  Inverted Files , 1992, Information Retrieval: Data Structures & Algorithms.

[9]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[10]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[11]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[12]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[13]  Hugh E. Williams,et al.  What's Next? Index Structures for Efficient Phrase Querying , 1999, Australasian Database Conference.

[14]  Hugh E. Williams,et al.  Compaction techniques for nextword indexes , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.