论文信息 - Parallel Techniques For Efficient Searching Over Very Large Text Collections

Parallel Techniques For Efficient Searching Over Very Large Text Collections

This paper mainly discusses the efficiency of PFIRE system, a parallel VSM-based text retrieval system running on the GCel3/512 Parsytec machine, as well as the effectiveness of the corresponding pre-existing serial FIRE system. Concerning PFIRE, the use of suitable data sharing and load balancing techniques in combination with specific pipelining techniques and with the capability of building binary and fat-tree virtual topologies over the 2-D mesh physical interconnection network of the parallel machine, leads to very fast interactive searching over the large scale TREC collections. Analytical and experimental evidence is presented to demonstrate the efficiency of our techniques. The corresponding conventional FIRE system was also used to measure the effectiveness (in terms of recall and precision) of several IR techniques (statistical phrase indexing, automatic statistical global thesaurus construction, etc) used over the TREC WSJ subcollection

[1] Paul G. Spirakis,et al. Parallel text retrieval on a high performance supercomputer using the Vector Space Model , 1995, SIGIR '95.

[2] Joel L. Fagan,et al. Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods , 1987, SIGIR.

[3] Christian Plaunt,et al. Subtopic structuring for full-length document access , 1993, SIGIR.

[4] Craig Stanfill. Partitioned posting files: a parallel inverted file structure for information retrieval , 1989, SIGIR '90.

[5] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[6] Peter Willett,et al. Parallel text searching in serial files using a processor farm , 1989, SIGIR '90.

[7] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[8] L. R. Rasmussen,et al. In information retrieval: data structures and algorithms , 1992 .

[9] Pavel Zezula,et al. Frame-sliced partitioned parallel signature files , 1992, SIGIR '92.

[10] Carolyn J. Crouch,et al. A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[11] James Allan,et al. Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[12] Carolyn J. Crouch,et al. Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[13] Fredric C. Gey,et al. X-Window interface to SMART, an advanced text retrieval system , 1992, SIGF.