Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition

This paper presents a Named Entity Recognition (NER) system for Spanish which combines the learning and knowledge approaches. Our contribution focuses on two matters: first, a discussion about selecting the best features for a machine learning NER system. Second, an error study of this system which lead us to the creation of a set of general post-processing rules. These issues are explained in detail and then evaluated. The selection of features provides an improvement of around 2.3% over the results of our previous system while the application of the set of post-processing rules provides an increment of performance which is around 3.6%, reaching finally 83.37% f-score.

[1]  Manuel Palomar,et al.  A Maximum Entropy-based Word Sense Disambiguation System , 2002, COLING.

[2]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[4]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[5]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[6]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[7]  Ingo Schröder A Case Study in Part-of-Speech Tagging Using the ICOPOST Toolkit , 2002 .

[8]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[9]  Marc Rossler,et al.  Using Markov Models for Named Entity recognition in German newspapers , 2002 .

[10]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[11]  Zornitsa Kozareva,et al.  NERUA: sistema de detección y clasificación de entidades utilizando aprendizaje automático , 2005, Proces. del Leng. Natural.

[12]  Maria Antònia Martí,et al.  MICE: a module for Named Entities Recognition and Classification , 2004 .

[13]  Antonio Toral Dramneri: a free knowledge based tool to Named Entity Recognition , 2005 .

[14]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[15]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner , 2007 .

[16]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[17]  Toine Bogers,et al.  Dutch Named Entity Recognition: Optimizing Features, Algorithms, and Output , 2004 .