A Corpus of Realistic Known-Item Topics with Associated Web Pages in the ClueWeb09

Known-item finding is the task of finding a previously seen item. Such items may range from visited websites to received emails but also read books or seen movies. Most of the research done on known-item finding focuses on web or email retrieval and is done on proprietary corpora not publically available. Public corpora usually are rather artificial as they contain automatically generated known-item queries or queries formulated by humans actually seeing the known-item.

[1]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[2]  Susan T. Dumais,et al.  Large scale analysis of web revisitation patterns , 2008, CHI.

[3]  Geert-Jan Houben,et al.  Cognitive Processes in Query Generation , 2011, ICTIR.

[4]  Matthias Hagen,et al.  Towards realistic known-item topics for the ClueWeb , 2012, IIiX.

[5]  David Elsweiler,et al.  Towards memory supporting personal information management tools , 2007 .

[6]  Matthias Hagen,et al.  Applying the User-over-Ranking Hypothesis to Query Formulation , 2011, ICTIR.

[7]  Mark Baillie,et al.  What Makes Re-finding Information Difficult? A Study of Email Re-finding , 2011, ECIR.

[8]  Bonnie A. Nardi,et al.  Finding and reminding: file organization from the desktop , 1995, SGCH.

[9]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[10]  W. Bruce Croft,et al.  Retrieval experiments using pseudo-desktop collections , 2009, CIKM.

[11]  David E. Losada,et al.  Seeding simulated queries with user-study data for personal search evaluation , 2011, SIGIR.

[12]  W. Bruce Croft,et al.  Ranking using multiple document types in desktop search , 2010, SIGIR '10.

[13]  Gareth J. F. Jones,et al.  A study of remembered context for information access from personal digital archives , 2008, IIiX.

[14]  Mark Baillie,et al.  Exploring memory in email refinding , 2008, TOIS.

[15]  Matthias Hagen,et al.  ChatNoir: a search engine for the ClueWeb09 corpus , 2012, SIGIR '12.

[16]  R. Gunning The Technique of Clear Writing. , 1968 .

[17]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[18]  Milad Shokouhi,et al.  Advances in Information Retrieval Theory, Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, UK, September 10-12, 2009, Proceedings , 2009, ICTIR.

[19]  Dominique L. Scapin,et al.  What do people recall about their documents?: implications for desktop search tools , 2007, IUI '07.

[20]  Jaime Teevan,et al.  Large scale query log analysis of re-finding , 2010, WSDM '10.

[21]  M. Angela Sasse,et al.  "Stuff goes into the computer and doesn't come out": a cross-tool study of personal information management , 2004, CHI.

[22]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[23]  Susan T. Dumais,et al.  Stuff I've Seen: A System for Personal Information Retrieval and Re-Use , 2003, SIGF.