Hybrid Partition Inverted Files: Experimental Validation

The rapid increase in content available in digital forms gives rise to large digital libraries, targeted to support millions of users and terabytes of data. Efficiently retrieving information then is a challenging task due to the size of the collection and its index. In this paper, our high performance "hybrid" partition inverted index is validated through experiments with a 100 Gbyte collection from TREC-9 and -10. The hybrid scheme combines the term and the document approaches to partitioning inverted indices across nodes of a parallel system. Experiments on a parallel system show that this organization outperforms the document and the term partitioning schemes. Our hybrid approach should support highly efficient searching for information in a large-scale digital library, implemented atop a network of computers.