Improving Entity Linking using Surface Form Refinement

In this paper, we present an algorithm for improving named entity resolution and entity linking by using surface form generation and rewriting. Surface forms consist of a word or a group of words that matches lexical units like Paris or New York City. Used as matching sequences to select candidate entries in a knowledge base, they contribute to the disambiguation of those candidates through similarity measures. In this context, misspelled textual sequences (entities) can be impossible to identify due to the lack of available matching surface forms. To address this problem, we propose an algorithm for surface form refinement based on Wikipedia resources. The approach extends the surface form coverage of our entity linking system, and rewrites or reformulates misspelled mentions (entities) prior to starting the annotation process. The algorithm is evaluated on the corpus associated with the monolingual English entity linking task of NIST KBP 2013. We show that the algorithm improves the entity linking system performance.

[1]  Michel Gagnon,et al.  A disambiguation resource extracted from Wikipedia for semantic annotation , 2012, LREC.

[2]  Gérard Bouchard,et al.  Name Variations And Computerized Record Linkage , 1980 .

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[5]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[6]  Marie-Jean Meurs,et al.  SemLinker system for KBP2013: A disambiguation algorithm based on mutual relations of semantic annotations inside a document , 2013, TAC.

[7]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[8]  Ilaria Bartolini,et al.  String Matching with Metric Trees Using an Approximate Distance , 2002, SPIRE.

[9]  Peter Christen,et al.  A Comparison of Personal Name Matching: Techniques and Practical Issues , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[11]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[12]  Juan-Manuel Torres-Moreno,et al.  NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems , 2010, LREC.

[13]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[14]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[15]  Atanas Kiryakov,et al.  KIM - Semantic Annotation Platform , 2003, SEMWEB.

[16]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[17]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[18]  Aldo Gangemi,et al.  A Comparison of Knowledge Extraction Tools for the Semantic Web , 2013, ESWC.

[19]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[20]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[21]  Frédéric Béchet,et al.  Unsupervised knowledge acquisition for Extracting Named Entities from speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Brian Randell,et al.  An Assessment of Name Matching Algorithms , 1996 .