Performance of inverted indices in shared-nothing distributed text document information retrieval systems

The impact on query processing performance of various physical organizations for inverted lists is compared. A probabilistic mode of the database and queries is introduced. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design tradeoffs over a range of hardware configurations and new parallel query processing strategies.<<ETX>>

[1]  Forbes J. Burkowski Retrieval performance of a distributed text database utilizing a parallel processor document server , 1990, DPDS '90.

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  Ann L. Chervenak,et al.  Performance Measurements of the First RAID Prototype , 1990 .

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Gerard Salton,et al.  Parallel text search methods , 1988, CACM.

[6]  Harold S. Stone,et al.  Parallel Querying of Large Databases: A Case Study , 1987, Computer.

[7]  Peter Weiss,et al.  Size reduction of inverted files using data compression and data structure reorganization , 1990 .

[8]  Zheng Lin CAT: an execution model for concurrent full text search , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[9]  Hector Garcia-Molina,et al.  Performance of Inverted Indices in Distributed Text Document Retrieval Systems , 1993 .

[10]  Ellen M. Voorhees,et al.  The efficiency of inverted index and cluster searches , 1986, SIGIR '86.

[11]  Ron Sacks-Davis,et al.  An e cient indexing technique for full-text database systems , 1992, VLDB 1992.

[12]  Christos Faloutsos,et al.  On B-Tree Indices for Skewed Distributions , 1992, VLDB.

[13]  Craig Stanfill,et al.  Parallel free-text search on the connection machine system , 1986, CACM.

[14]  Fausto Rabitti,et al.  Evaluation of Access Methods to Text Document in Office Systems , 1984, SIGIR.

[15]  Hava T. Siegelmann,et al.  On the allocation of documents in multiprocessor information retrieval systems , 1991, SIGIR '91.

[16]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[17]  Jane Fedorowicz,et al.  Database performance evaluation in an indexed file environment , 1987, TODS.

[18]  Donna K. Harman,et al.  Retrieving Records from a Gigabyte of Text on a Mini-Computer Using Statistical Ranking , 1990, J. Am. Soc. Inf. Sci..

[19]  Frans Sijstermans,et al.  High-quality and high-performance full-text document retrieval: the Parallel InfoGuide System , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[20]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[21]  Donna Harman,et al.  Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking. , 1990 .