Collection Fusion using Relevance Distribution Information between Queries and Collections in Digital Libraries

This paper proposes an effective fusion algorithm for retrieval results from heterogeneous information sources in federated digital libraries. The algorithm determines the population of documents retrieved from involved information sources for a given query and evaluates the degree of relevance between the query and the population. The evaluated results are used as relevance distribution information for collection fusion. The main informations used for the fusion are relevance distribution among collections, the population size N, and ranking information of relevant documents in their origin. We also present th performance evaluation of the algorithm by developing the prototype of a meta-searcher.