Deriving a Domain Specific Test Collection from a Query Log

Cultural heritage, and other special domains, pose a particular problem for information retrieval: evaluation requires a dedicated test collection that takes the particular documents and information requests into account, but building such a test collection requires substantial human effort. This paper investigates methods of generating a document retrieval test collection from a search engine’s transaction log, based on submitted queries and user-click data. We test our methods on a museum’s search log file, and compare the quality of the generated test collections against a collection with manually generated and judged known-item topics. Our main findings are the following. First, the test collection derived from a transaction log corresponds well to the actual search experience of real users. Second, the ranking of systems based on the derived judgments corresponds well to the ranking based on the manual topics. Third, deriving pseudo-relevance judgments from a transaction log file is an attractive option in domains where dedicated test collections are not readily available.

[1]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[2]  Bernard J. Jansen,et al.  Search log analysis: What it is, what's been done, how to do it , 2006 .

[3]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[4]  Susan T. Dumais,et al.  SIGIR 2003 workshop report: implicit measures of user interests and preferences , 2003, SIGF.

[5]  Sally Jo Cunningham,et al.  A transaction log analysis of a digital library , 2000, International Journal on Digital Libraries.

[6]  Avi Arampatzis,et al.  Unified Access to Heterogeneous Data in Cultural Heritage , 2007, RIAO.

[7]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[8]  Peter Ingwersen,et al.  The Turn - Integration of Information Seeking and Retrieval in Context , 2005, The Kluwer International Series on Information Retrieval.

[9]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[10]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[11]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[12]  Douglas W. Oard,et al.  Modeling Information Content Using Observable Behavior , 2001 .

[13]  Olivia R. Liu Sheng,et al.  Analysis of the query logs of a Web site search engine , 2005, J. Assoc. Inf. Sci. Technol..

[14]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , 2004 .

[15]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.