论文信息 - Comparing the performance of collection selection algorithms

Comparing the performance of collection selection algorithms

The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multicollection environment. Multicollection searching is cast in three parts: collection selection (also referred to as database selection), query processing and results merging. In this work, we focus our attention on the evaluation of the first step, collection selection.In this article, we present a detailed discussion of the methodology that we used to evaluate and compare collection selection approaches, covering both test environments and evaluation measures. We compare the CORI, CVV and gGLOSS collection selection approaches using six test environments utilizing three document testbeds. We note similar trends in performance among the collection selection approaches, but the CORI approach consistently outperforms the other approaches, suggesting that effective collection selection can be achieved using limited information about each collection.The contributions of this work are both the assembled evaluation methodology as well as the application of that methodology to compare collection selection approaches in a standardized environment.

James C. French | Allison L. Powell | J. French

[1] James C. French,et al. Metrics for evaluating database selection techniques , 2004, World Wide Web.

[2] Amanda Spink,et al. Interaction in information retrieval: selection and effectiveness of search terms , 1997 .

[3] King-Lup Liu,et al. Estimating the usefulness of search engines , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4] Ellen M. Voorhees,et al. The Collection Fusion Problem , 1994, TREC.

[5] Divyakant Agrawal,et al. Pharos: a scalable distributed architecture for locating heterogeneous information sources , 1997, CIKM '97.

[6] James C. French,et al. Effective and Efficient Automatic Database Selection , 1999 .

[7] Jian Xu,et al. ZBroker: a query routing broker for Z39.50 databases , 1999, CIKM '99.

[8] Edward A. Fox,et al. Multilingual Federated Searching Across Heterogeneous Collections , 1998, D Lib Mag..

[9] Ellen M. Voorhees,et al. Learning collection fusion strategies , 1995, SIGIR '95.

[10] W. Bruce Croft,et al. Searching distributed collections with inference networks , 1995, SIGIR '95.

[11] King-Lup Liu,et al. Finding the most similar documents across multiple text databases , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[12] Norbert Fuhr,et al. A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[13] King-Lup Liu,et al. A Statistical Method for Estimating the Usefulness of Text Databases , 2002, IEEE Trans. Knowl. Data Eng..

[14] Luis Gravano,et al. STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[15] Peter Bailey,et al. Server selection on the World Wide Web , 2000, DL '00.