Resource selection is an important topic in distributed information retrieval research. It can be a component of a distributed information retrieval task and can also serve as an independent application of database recommendation system together with the resource representation part. There is a large body of valuable prior research on resource selection but very little has studied about the effects of different database size distributions on resource selection. In this paper, we propose extended versions of two well-known resource selection algorithms: CORI and KL divergence in order to consider the factors of database size distributions, and compare them with the lately proposed Relevant Document Distribution Estimation (ReDDE) resource selection algorithm. Experiments were done on four testbeds with different characteristics, and the ReDDE and the extended KL divergence resource selection algorithm have been shown to be more robust in various environments.
[1]
James P. Callan,et al.
Query-based sampling of text databases
,
2001,
TOIS.
[2]
Jamie Callan,et al.
DISTRIBUTED INFORMATION RETRIEVAL
,
2002
.
[3]
Luo Si,et al.
A language modeling framework for resource selection and results merging
,
2002,
CIKM '02.
[4]
W. Bruce Croft,et al.
Cluster-based language models for distributed retrieval
,
1999,
SIGIR '99.
[5]
James C. French,et al.
Comparing the performance of database selection algorithms
,
1999,
SIGIR '99.
[6]
Luo Si,et al.
Using sampled data and regression to merge search engine results
,
2002,
SIGIR '02.
[7]
Luis Gravano,et al.
STARTS: Stanford proposal for Internet meta-searching
,
1997,
SIGMOD '97.