A Plugin Architecture Enabling Federated Search for Digital Libraries

Today, users expect a variety of digital libraries to be searchable from a single Web page. The German Vascoda project provides this service for dozens of information sources. Its ultimate goal is to provide search quality close to the ranking of a central database containing documents from all participating libraries. Currently, however, the Vascoda portal is based on a non-cooperative metasearch approach, where results from sources are merged randomly and ranking quality is sub-optimal. In this paper, we describe a Lucene-based plugin which replaces this method by a truly federated search across different search engines, where the exchange of document statistics improves document ranking. Preliminary evaluation results show ranking results equal to a centralized setup.

[1]  Luis Gravano,et al.  STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[2]  Luis Gravano,et al.  SDLIP + STARTS = SDARTS a protocol and toolkit for metasearching , 2001, JCDL '01.

[3]  Heike Neuroth,et al.  VASCODA: A German Scientific Portal for Cross-Searching Distributed Digital Resource Collections , 2003, ECDL.

[4]  Kurt Maly,et al.  Federated Searching Interface Techniques for Heterogeneous OAI Repositories , 2006, J. Digit. Inf..

[5]  Nick Craswell,et al.  Methods for Distributed Information Retrieval , 2000 .

[6]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[7]  Carl Lagoze,et al.  The Open Archives Initiative Protocol for Metadata Harvesting Protocol , 2002 .

[8]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[9]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[10]  Norbert Fuhr,et al.  Daffodil: An Integrated Desktop for Supporting High-Level Search Activities in Federated Digital Libraries , 2002, ECDL.

[11]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[12]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[13]  Herbert Van de Sompel,et al.  Open Archives Initiative - Protocol for Metadata Harvesting - v.2.0 , 2002 .

[14]  Wolf-Tilo Balke,et al.  DL Meets P2P - Distributed Document Retrieval Based on Classification and Content , 2005, ECDL.

[15]  Press Niso Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection, Z39.50-1995 , 1994 .