Focused Search in Books and Wikipedia: Categories, Links and Relevance Feedback

In this paper we describe our participation in INEX 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet categories. We explore how we can use both types of category information, in the Ad Hoc Track as well as in the Entity Ranking Track. Results in the Ad Hoc Track show Wikipedia categories are more effective than WordNet categories, and Wikipedia categories in combination with relevance feed-back lead to the best results. Preliminary results of the Book Track show full-text retrieval is effective for high early precision. Relevance feedback further increases early precision. Our findings for the Entity Ranking Track are in direct opposition of our Ad Hoc findings, namely, that the WordNet categories are more effective than the Wikipedia categories. This marks an interesting difference between ad hoc search and entity ranking.

[1]  Jaap Kamps,et al.  Is Wikipedia link structure different? , 2009, WSDM '09.

[2]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[3]  Andrew Trotman,et al.  Comparative Evaluation of XML Information Retrieval Systems: 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006 Dagstuhl Castle, Germany, December 17-20, 2006 Revised and Selected Papers , 2005 .

[4]  M. de Rijke,et al.  Mixture Models, Overlap, and Structural Hints in XML Element Retrieval , 2004, INEX.

[5]  M. de Rijke,et al.  An Element-based Approach to XML Retrieval , 2004 .

[6]  Gabriella Kazai,et al.  Towards methods for the collective gathering and quality control of relevance assessments , 2009, SIGIR.

[7]  Jaap Kamps,et al.  What's in a Link? From Document Importance to Topical Relevance , 2009, ICTIR.

[8]  Jaap Kamps,et al.  Finding Entities in Wikipedia Using Links and Categories , 2008, INEX.

[9]  Andrew Trotman,et al.  Advances in Focused Retrieval, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, December 15-18, 2008. Revised and Selected Papers , 2009, INEX.

[10]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[11]  Milad Shokouhi,et al.  Advances in Information Retrieval Theory, Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, UK, September 10-12, 2009, Proceedings , 2009, ICTIR.

[12]  Jaap Kamps,et al.  Using wikipedia categories for ad hoc search , 2009, SIGIR.

[13]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl Castle, Germany, November 28-30, 2005, Revised Selected Papers , 2006, INEX.

[14]  Jaap Kamps,et al.  Filtering and Clustering XML Retrieval Results , 2006, INEX.

[15]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[16]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[17]  Jaap Kamps,et al.  Using and Detecting Links in Wikipedia , 2007, INEX.

[18]  Jaap Kamps,et al.  The Effect of Structured Queries and Selective Indexing on XML Retrieval , 2005, INEX.

[19]  Jaap Kamps,et al.  The Impact of Document Level Ranking on Focused Retrieval , 2008, INEX.

[20]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[21]  James A. Thom,et al.  Using Wikipedia Categories and Links in Entity Ranking , 2007, INEX.

[22]  Gjergji Kasneci,et al.  YAWN: A Semantically Annotated Wikipedia XML Corpus , 2007, BTW.