An Exploration of Retrieval-Enhancing Methods for Integrated Search in a Digital Library

Integrated search is defined as searching across different document types and representations simultaneously, with the goal of presenting the user with a single ranked result list containing the optimal mix of document types. In this paper, we compare various approaches to integrating three different types of documents (bibliographic records for articles and books as well as full-text articles) using the iSearch collection: combining all document types in a single index, weighting the different document types using priors, and using collection fusion techniques to merge the retrieval results on three separate indexes corresponding to each of the document types. We find that a properly optimized retrieval model on a single combined index containing all documents without any special treatment performs no worse than our weighting and fusion methods, suggesting that more work is needed on alternative approaches to integrated search.

[1]  B R Schatz,et al.  Information Retrieval in Digital Libraries: Bringing Search to the Net , 1997, Science.

[2]  Subrata Deb TERI Integrated Digital Library Initiative , 2006, Electron. Libr..

[3]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[4]  Niels Ole Pors Rationality and educational requirements: exploring students' information behaviour , 2006, IIiX.

[5]  Graham Stone Searching Life, the Universe and Everything? The Implementation of Summon at the University of Huddersfield , 2010 .

[6]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[7]  Umberto Straccia,et al.  Web metasearch: rank vs. score based rank aggregation methods , 2003, SAC '03.

[8]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[9]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[10]  Mounia Lalmas,et al.  Workshop on aggregated search , 2008, SIGF.

[11]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[12]  Ramesh C. Jain Visual Information Retrieval in Digital Libraries , 1996, Data Processing Clinic.

[13]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[14]  Claire Duddy A personal perspective on accessing academic information in the Google era, or'How I learned to stop worrying and love Google' , 2009 .

[15]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.