Named Entity Disambiguation: a Hybrid Approach

Abstract Semantic annotation of named entities for enriching unstructured content is a critical step in development of Semantic Web and many Natural Language Processing applications. To this end, this paper addresses the named entity disambiguation problem that aims at detecting entity mentions in a text and then linking them to entries in a knowledge base. In this paper, we propose a hybrid method, combining heuristics and statistics, for named entity disambiguation. The novelty is that the disambiguation process is incremental and includes several rounds that filter the candidate referents, by exploiting previously identified entities and extending the text by those entity attributes every time they are successfully resolved in a round. Experiments are conducted to evaluate and show the advantages of the proposed method. The experiment results show that our approach achieves high accuracy and can be used to construct a robust entity disambiguation system.

[1]  Jian Su,et al.  A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection , 2011, IJCNLP.

[2]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[3]  Julio Gonzalo,et al.  WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks , 2010, CLEF.

[4]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[5]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[6]  Andreas Abecker,et al.  Entity Reference Resolution via Spreading Activation on RDF-Graphs , 2010, ESWC.

[7]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[8]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[9]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[10]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[11]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[12]  Philip S. Yu,et al.  ADANA: Active Name Disambiguation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[13]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[14]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[15]  Atanas Kiryakov,et al.  Semantic annotation, indexing, and retrieval , 2004, J. Web Semant..

[16]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[17]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[18]  Nilesh N. Dalvi,et al.  Large-Scale Collective Entity Matching , 2011, Proc. VLDB Endow..

[19]  Wanxiang Che,et al.  A Graph-based Method for Entity Linking , 2011, IJCNLP.

[20]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[21]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[22]  Cheng Niu,et al.  Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction , 2004, ACL.

[23]  Jian Su,et al.  Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[24]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[25]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[26]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[27]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[28]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[29]  Dan Roth,et al.  Robust Reading: Identification and Tracing of Ambiguous Names , 2004, NAACL.

[30]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[31]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[32]  Rada Mihalcea,et al.  Linking Documents to Encyclopedic Knowledge , 2008, IEEE Intelligent Systems.

[33]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[34]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[35]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[36]  Kan Li,et al.  Text Categorization Based on Topic Model , 2008, RSKT.

[37]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[38]  Jenny R. Saffran,et al.  Distributional structure in language: Contributions to noun–verb difficulty differences in infant word recognition , 2014, Cognition.

[39]  Ying Chen,et al.  Towards Robust Unsupervised Personal Name Disambiguation , 2007, EMNLP-CoNLL.

[40]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[41]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[42]  Neil R. Smalheiser,et al.  Author name disambiguation , 2009, Annu. Rev. Inf. Sci. Technol..

[43]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[44]  Tru H. Cao,et al.  A Knowledge-Based Approach to Named Entity Disambiguation in News Articles , 2007, Australian Conference on Artificial Intelligence.

[45]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[46]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[47]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[48]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[49]  Stefan M. Rüger,et al.  Using co‐occurrence models for placename disambiguation , 2008, Int. J. Geogr. Inf. Sci..

[50]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[51]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[52]  Xiaoyan Zhu,et al.  Learning to Link Entities with Knowledge Base , 2010, NAACL.

[53]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[54]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[55]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[56]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[57]  Tru H. Cao,et al.  Exploring Wikipedia and Text Features for Named Entity Disambiguation , 2010, ACIIDS.

[58]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[59]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[60]  Tru H. Cao,et al.  Named entity disambiguation on an ontology enriched by Wikipedia , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[61]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[62]  Lan Nie,et al.  Resolving Surface Forms to Wikipedia Topics , 2010, COLING.

[63]  Kalina Bontcheva,et al.  Shallow Methods for Named Entity Coreference Resolution , 2002 .