论文信息 - Impact of the Query Model and System Settings on Performance of Distributed Inverted Indexes

Impact of the Query Model and System Settings on Performance of Distributed Inverted Indexes

This paper presents an evaluation of three partitioning methods for distributed inverted indexes: local, global and hybrid indexing, combined with two generalized query models: conjunctive query model (AND) and disjunctive query model (OR). We performed simulations of various settings using a dictionary dump of the TREC GOV2 document collection and a subset of the Terabyte Track 05 query log. Our results indicate that, in situations when a conjunctive query model is used in combination with a high level of concurrency, the best performance and scalability are provided by global indexing with pipelined query execution. For other situations local indexing is a more advantageous method in terms of average query throughput, query wait time, system load and load imbalance.

Svein Erik Bratsberg | Simon Jonassen

[1] Edward A. Fox,et al. Hybrid Partition Inverted Files: Experimental Validation , 2002, ECDL.

[2] N. Ziviani,et al. Distributed query processing using partitioned inverted files , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[3] Berthier A. Ribeiro-Neto,et al. Basic issues on the processing of web queries , 2005, SIGIR '05.

[4] Berthier A. Ribeiro-Neto,et al. Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[5] Hector Garcia-Molina,et al. Query processing and inverted indices in shared-nothing text document information retrieval systems , 1993, The VLDB Journal.

[6] Alistair Moffat,et al. A pipelined architecture for distributed text query evaluation , 2007, Information Retrieval.

[7] Stephen E. Robertson,et al. Parallel search using partitioned inverted files , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[8] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[9] Alistair Moffat,et al. Load balancing for term-distributed parallel retrieval , 2006, SIGIR.

[10] Ohm Sornil,et al. Parallel Inverted Indices for Large-Scale, Dynamic Digital Libraries , 2001 .

[11] Byeong-Soo Jeong,et al. Inverted File Partitioning Schemes in Multiple Disk Systems , 1995, IEEE Trans. Parallel Distributed Syst..