IdentityRank: Named entity disambiguation in the news domain

Abstract News companies produce news items that describe events that happen in the world. These news items usually contain mentions to persons, organizations, locations and other types of named entities that are involved in the events being described. These named entities may have an ambiguous meaning, which impacts the performance of free-text information retrieval systems. In this paper the IdentityRank algorithm, designed to address the problem of named entity disambiguation in news items, is described. It has been developed as part of the EU-funded project News Engine Web Services (NEWS) and is specifically designed to operate within the editorial environment of a news company. The algorithm was implemented and evaluated using several corpora of actual news items, achieving an average accuracy of around 96%.

[1]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[2]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[3]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[4]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[6]  Tapio Salakoski,et al.  New Techniques for Disambiguation in Natural Language and Their Application to Biological Text , 2004, J. Mach. Learn. Res..

[7]  R. Mitkov The Oxford Handbook of Computational Linguistics (Oxford Handbooks) , 2003 .

[8]  Ansgar Bernardi,et al.  IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project , 2007, ESWC.

[9]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[10]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[11]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[12]  Ying Chen,et al.  Towards Robust Unsupervised Personal Name Disambiguation , 2007, EMNLP-CoNLL.

[13]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[14]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[15]  Raphael Volz,et al.  Ontology based entity disambiguation with natural language patterns , 2009, 2009 Fourth International Conference on Digital Information Management.

[16]  Hai Jin,et al.  Name Disambiguation Using Semantic Association Clustering , 2009, 2009 IEEE International Conference on e-Business Engineering.

[17]  Tru H. Cao,et al.  Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach , 2008, ASWC.

[18]  Siegfried Handschuh,et al.  Creating ontology-based metadata by annotation for the semantic web , 2005 .

[19]  Tru H. Cao,et al.  A Knowledge-Based Approach to Named Entity Disambiguation in News Articles , 2007, Australian Conference on Artificial Intelligence.

[20]  Ismailcem Budak Arpinar,et al.  Ontology-Driven Automatic Entity Disambiguation in Unstructured Text , 2006, SEMWEB.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  Ping Chen,et al.  Biomedical Term Disambiguation: An Application to Gene-Protein Name Disambiguation , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[23]  Vasileios Hatzivassiloglou,et al.  Disambiguating proteins, genes, and RNA in text: a machine learning approach , 2001, ISMB.

[24]  Eugénio C. Oliveira,et al.  An Approach to Web-Scale Named-Entity Disambiguation , 2009, MLDM.

[25]  Escuela Politécnica Superior,et al.  Semantically enhanced Information Retrieval: an ontology-based approach , 2009 .

[26]  S. Soderland,et al.  - based Named Entity Disambiguation to Arbitrary Web Text , 2009 .

[27]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[28]  George Lindfield,et al.  Numerical Methods Using MATLAB , 1998 .

[29]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[30]  Norberto Fernández García,et al.  The NEWS ontology: Design and applications , 2010, Expert Syst. Appl..

[31]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[32]  Julio Gonzalo,et al.  WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks , 2010, CLEF.

[33]  Asunción Gómez-Pérez,et al.  Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web , 2004, Advanced Information and Knowledge Processing.

[34]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[35]  Ruben Heradio,et al.  Understanding the role of conceptual relations in Word Sense Disambiguation , 2011, Expert Syst. Appl..

[36]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[37]  Michael Sintek,et al.  NEWS: Bringing Semantic Web Technologies into News Agencies , 2006, SEMWEB.

[38]  Xiaoyan Zhu,et al.  Learning to Link Entities with Knowledge Base , 2010, NAACL.

[39]  Ricardo Baeza-Yates,et al.  Towards Semantic Search , 2008, NLDB.

[40]  Ziqi Zhang,et al.  Semantic Relatedness Approach for Named Entity Disambiguation , 2010, IRCDL.

[41]  Xianpei Han,et al.  Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation , 2010, ACL.

[42]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[43]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[44]  Ning Xia,et al.  Combining multiple disambiguation methods for gene mention normalization , 2011, Expert Syst. Appl..

[45]  G. W. Stewart Perron-Frobenius theory: a new proof of the basics , 1994 .

[46]  Eugénio C. Ferreira,et al.  BioDR: Semantic indexing networks for biomedical document retrieval , 2010, Expert Syst. Appl..

[47]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[48]  Benoît Sagot,et al.  Resources for Named Entity Recognition and Resolution in News Wires , 2010 .

[49]  Miles Osborne,et al.  Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10) , 2010 .

[50]  Lei Shi,et al.  Web Person Name Disambiguation by Relevance Weighting of Extended Feature Sets , 2010, CLEF.