Entity Linking Leveraging Automatically Generated Annotation

Entity linking refers entity mentions in a document to their representations in a knowledge base (KB). In this paper, we propose to use additional information sources from Wikipedia to find more name variations for entity linking task. In addition, as manually creating a training corpus for entity linking is laborintensive and costly, we present a novel method to automatically generate a large scale corpus annotation for ambiguous mentions leveraging on their unambiguous synonyms in the document collection. Then, a binary classifier is trained to filter out KB entities that are not similar to current mentions. This classifier not only can effectively reduce the ambiguities to the existing entities in KB, but also be very useful to highlight the new entities to KB for the further population. Furthermore, we also leverage on the Wikipedia documents to provide additional information which is not available in our generated corpus through a domain adaption approach which provides further performance improvements. The experiment results show that our proposed method outperforms the state-of-the-art approaches.

[1]  Tru H. Cao,et al.  Named entity disambiguation on an ontology enriched by Wikipedia , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[2]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[3]  S. Sekine,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, *SEMEVAL.

[4]  David Yarowsky,et al.  HLTCOE Approaches to Knowledge Base Population at TAC 2009 , 2009, TAC.

[5]  Raphael Volz,et al.  Towards Ontology-based Disambiguation of Geographical Identifiers , 2007, I3.

[6]  Yang Tang,et al.  THU QUANTA at TAC 2009 KBP and RTE Track , 2009, TAC.

[7]  Vasudeva Varma,et al.  IIIT Hyderabad at TAC 2009 , 2008, TAC.

[8]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[9]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[10]  Julio Gonzalo,et al.  The role of named entities in Web People Search , 2009, EMNLP.

[11]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[12]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[13]  Valentin I. Spitkovsky,et al.  Stanford-UBC at TAC-KBP , 2009, TAC.

[14]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[15]  Razvan C. Bunescu,et al.  Learning for information extraction: from named entity recognition and disambiguation to relation extraction , 2007 .

[16]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[17]  Xianpei Han,et al.  NLPR_KBP in TAC 2009 KBP Track: A Two-Stage Method to Entity Linking , 2009, TAC.

[18]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[19]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[20]  Atanas Kiryakov,et al.  KIM – a semantic platform for information extraction and retrieval , 2004, Natural Language Engineering.

[21]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.