Wide-Coverage Spanish Named Entity Extraction

This paper presents a proposal for wide-coverage Named Entity-Extraction for Spanish. The extraction of named entities is treated using robust Machine Learning techniques (AdaBoost) and simple attributes requiring non-linguistically processed corpora, complemented with external information sources (a list of trigger words and a gazetteer). A thorough evaluation of the task on real corpora is presented in order to validate the appropriateness of the approach. The non linguistic nature of used features makes the approach easily portable to other languages.

[1]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[2]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[3]  Douglas E. Appelt,et al.  SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[4]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[5]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[6]  Herbert Gish,et al.  BBN: Description of the PLUM System as Used for MUC-5 , 2005, MUC.

[7]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[8]  Ralph M. Weischedel,et al.  BEN: description of the PLUM system as used for MUC-6 , 1995, MUC.

[9]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[10]  Xavier Carreras,et al.  Boosting trees for clause splitting , 2001, CoNLL.

[11]  Sergi Cervell,et al.  An environment for mophosyntactic processing of unrestricted Spanish text , 1998 .

[12]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[13]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[16]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[17]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[18]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[19]  Shuanhu Bai,et al.  Description of the Kent Ridge Digital Labs System Used for MUC-7 , 1998, MUC.