A Joint Named Entity Recognition and Entity Linking System

We present a joint system for named entity recognition (NER) and entity linking (EL), allowing for named entities mentions extracted from textual data to be matched to uniquely identifiable entities. Our approach relies on combined NER modules which transfer the disambiguation step to the EL component, where referential knowledge about entities can be used to select a correct entity reading. Hybridation is a main feature of our system, as we have performed experiments combining two types of NER, based respectively on symbolic and statistical techniques. Furthermore, the statistical EL module relies on entity knowledge acquired over a large news corpus using a simple rule-base disambiguation tool. An implementation of our system is described, along with experiments and evaluation results on French news wires. Linking accuracy reaches up to 87%, and the NER F-score up to 83%.

[1]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[2]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[4]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[5]  Frédéric Béchet,et al.  Coopération de méthodes statistiques et symboliques pour l’adaptation non-supervisée d’un système d’étiquetage en entités nommées (Statistical and symbolic methods cooperation for the unsupervised adaptation of a named entity recognition system) , 2011, JEPTALNRECITAL.

[6]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[7]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[8]  Frédéric Béchet,et al.  Unsupervised knowledge acquisition for Extracting Named Entities from speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[10]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[11]  Benoît Sagot,et al.  Détection et résolution d’entités nommées dans des dépêches d’agence , 2010, JEPTALNRECITAL.

[12]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[13]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[14]  Benoît Sagot,et al.  The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French , 2010, LREC.

[15]  Benoît Sagot,et al.  Aleda, a free large-scale entity database for French , 2012, LREC.

[16]  Benoît Sagot,et al.  SxPipe 2: architecture pour le traitement pré-syntaxique de corpus bruts , 2008 .