Ranking related entities: components and analyses

Related entity finding is the task of returning a ranked list of homepages of relevant entities of a specified type that need to engage in a given relationship with a given source entity. We propose a framework for addressing this task and perform a detailed analysis of four core components; co-occurrence models, type filtering, context modeling and homepage finding. Our initial focus is on recall. We analyze the performance of a model that only uses co-occurrence statistics. While this method identifies the potential set of related entities, it fails to rank them effectively. Two types of error emerge: (1) entities of the wrong type pollute the ranking and (2) while somehow associated to the source entity, some retrieved entities do not engage in the right relation with it. To address (1), we add type filtering based on category information available in Wikipedia. To correct for (2), we complement our related entity finding method with contextual information, represented as language models derived from documents in which source and target entities co-occur. To complete the pipeline, we find homepages of top ranked entities by combining a language modeling approach with heuristics based on Wikipedia's external links. Our method achieves very high recall scores on the end-to-end task, providing a solid starting point for expanding our focus to improve precision. Our framework can effectively incorporate additional heuristics and these extensions lead to state-of-the-art performance.

[1]  Djoerd Hiemstra,et al.  Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah , 2008, INEX.

[2]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[3]  Hideki Kashioka,et al.  NiCT at TREC 2009: Employing Three Models for Entity Ranking Track , 2009, TREC.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[6]  Ellen M. Voorhees,et al.  The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.

[7]  Arjen P. de Vries,et al.  Delft University at the TREC 2009 Entity Track: Ranking Wikipedia Entities , 2009, TREC.

[8]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[9]  James Allan,et al.  An Exploration of Entity Models, Collective Classification and Relation Description , 2004 .

[10]  Craig MacDonald,et al.  University of Glasgow at TREC 2009: Experiments with Terrier , 2009, TREC.

[11]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[12]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[13]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[14]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[15]  Wei Zheng,et al.  UDEL/SMU at TREC 2009 Entity Track , 2009, TREC.

[16]  Wouter Weerkamp,et al.  A Generative Language Modeling Approach for Ranking Entities , 2008, INEX.

[17]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[18]  Coskun Bayrak,et al.  A Journey in Entity Related Retrieval for TREC 2009 , 2009, TREC.

[19]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[20]  Stefan M. Rüger,et al.  Integrating Document Features for Entity Ranking , 2008, INEX.

[21]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[22]  Yue Liu,et al.  A Novel Framework for Related Entities Finding: ICTNET at TREC 2009 Entity Track , 2009, TREC.

[23]  Paul Thomas,et al.  Overview of the TREC 2009 Entity Track , 2009, TREC.

[24]  James A. Thom,et al.  Using Wikipedia Categories and Links in Entity Ranking , 2007, INEX.

[25]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[26]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[27]  Jun Guo,et al.  BUPT at TREC 2009: Entity Track , 2009, TREC.

[28]  Nick Craswell,et al.  L3S at INEX 2008: Retrieving Entities Using Structured Information , 2008, INEX.

[29]  Luo Si,et al.  Entity Retrieval with Hierarchical Relevance Model, Exploiting the Structure of Tables and Learning Homepage Classifiers , 2009, TREC.

[30]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[31]  Wei Lu,et al.  Adapting Language Modeling Methods for Expert Search to Rank Wikipedia Entities , 2008, INEX.

[32]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[33]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[34]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[35]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[36]  Jaap Kamps,et al.  Result Diversity and Entity Ranking Experiments: Anchors, Links, Text and Wikipedia , 2009, TREC.

[37]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[38]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[39]  M. de Rijke,et al.  Type Checking in Open-Domain Question Answering , 2004, ECAI.

[40]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[41]  Peng Jiang,et al.  Experiments on Related Entity Finding Track at TREC 2009 , 2009, TREC.

[42]  ChengXiang Zhai,et al.  Finding Related Entities by Retrieving Relations: UIUC at TREC 2009 Entity Track , 2009, TREC.