Using Wikipedia Categories and Links in Entity Ranking

This paper describes the participation of the INRIA group in the INEX 2007 XML entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on both the training and the testing data sets demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks.

[1]  Mounira Harzallah,et al.  A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology , 2006, Data Science and Classification.

[2]  Djoerd Hiemstra,et al.  Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah , 2008, INEX.

[3]  Stefan M. Rüger,et al.  Integrating Document Features for Entity Ranking , 2008, INEX.

[4]  James A. Thom,et al.  Exploiting Locality of Wikipedia Links in Entity Ranking , 2008, ECIR.

[5]  James A. Thom,et al.  Entity ranking in Wikipedia , 2007, SAC '08.

[6]  Ricardo Baeza-Yates,et al.  A Comparison of Open Source Search Engines , 2007 .

[7]  James A. Thom,et al.  Ontology evaluation using wikipedia categories for browsing , 2007, CIKM '07.

[8]  Andrew Trotman,et al.  Comparative Evaluation of XML Information Retrieval Systems: 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006 Dagstuhl Castle, Germany, December 17-20, 2006 Revised and Selected Papers , 2005 .

[9]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[10]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .

[11]  Jianfeng Gao,et al.  A Supervised Learning Approach to Entity Search , 2006, AIRS.

[12]  James A. Thom,et al.  Use of Wikipedia Categories in Entity Ranking , 2007, ArXiv.

[13]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[14]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[15]  M. de Rijke,et al.  Entity Retrieval , 2007 .

[16]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[17]  D. N. F. Awang Iskandar,et al.  Social Media Retrieval Using Image Features and Structured Text , 2006, INEX.

[18]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[19]  Fabian M. Suchanek,et al.  ESTER: efficient search on text, entities, and relations , 2007, SIGIR.