On the Ranking of Text Documents from Large Corpuses

Ranking text documents based on their relevance to a topic is of great importance in information retrieval. However, giving the increasingly available avalanche of digital documents, the size of collection pool from which these documents are drawn makes this task more challenging. In addition, current computing infrastructure is unable to deal with very large corpuses directly. Thus, new algorithms are needed to seek parallel solutions and utilize more processing power to solve this problem. In this paper we propose a new algorithm that partitions a large collection of documents (a corpus) into smaller corpuses that can each be handled by a single processor for the purpose of ranking. These multiple rankings are then merged together to provide a unified listing of all selected documents from the original large corpus.