Parallel Techniques For Efficient Searching Over Very Large Text Collections

This paper mainly discusses the efficiency of PFIRE system, a parallel VSM-based text retrieval system running on the GCel3/512 Parsytec machine, as well as the effectiveness of the corresponding pre-existing serial FIRE system. Concerning PFIRE, the use of suitable data sharing and load balancing techniques in combination with specific pipelining techniques and with the capability of building binary and fat-tree virtual topologies over the 2-D mesh physical interconnection network of the parallel machine, leads to very fast interactive searching over the large scale TREC collections. Analytical and experimental evidence is presented to demonstrate the efficiency of our techniques. The corresponding conventional FIRE system was also used to measure the effectiveness (in terms of recall and precision) of several IR techniques (statistical phrase indexing, automatic statistical global thesaurus construction, etc) used over the TREC WSJ subcollection