A joint probabilistic classification model for resource selection

Resource selection is an important task in Federated Search to select a small number of most relevant information sources. Current resource selection algorithms such as GlOSS, CORI, ReDDE, Geometric Average and the recent classification-based method focus on the evidence of individual information sources to determine the relevance of available sources. Current algorithms do not model the important relationship information among individual sources. For example, an information source tends to be relevant to a user query if it is similar to another source with high probability of being relevant. This paper proposes a joint probabilistic classification model for resource selection. The model estimates the probability of relevance of information sources in a joint manner by considering both the evidence of individual sources and their relationship. An extensive set of experiments have been conducted on several datasets to demonstrate the advantage of the proposed model.

[1]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[2]  W. Bruce Croft,et al.  Blog site search using resource selection , 2008, CIKM '08.

[3]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[4]  Luo Si,et al.  Learning from past queries for resource selection , 2009, CIKM.

[5]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[6]  Luo Si,et al.  Unified utility maximization framework for resource selection , 2004, CIKM '04.

[7]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[8]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[9]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[10]  Luis Gravano,et al.  STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[11]  Peter Bailey,et al.  Server selection on the World Wide Web , 2000, DL '00.

[12]  James P. Callan,et al.  Effective retrieval with distributed collections , 1998, SIGIR '98.

[13]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[14]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[15]  Milad Shokouhi,et al.  SUSHI : Scoring Scaled Samples for Server Selection , 2009 .

[16]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[17]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[18]  Luo Si,et al.  Modeling search engine effectiveness for federated search , 2005, SIGIR '05.

[19]  Max Welling,et al.  Learning in Markov Random Fields An Empirical Study , 2005 .

[20]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[21]  Fernando Diaz,et al.  Classification-based resource selection , 2009, CIKM.

[22]  Milad Shokouhi,et al.  Robust result merging using sample-based score estimates , 2009, TOIS.

[23]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[24]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[25]  Norbert Fuhr,et al.  Resource Discovery in Distributed Digital Libraries , 1999 .