Named Entity Recognition For Catalan Using Only Spanish Resources and Unlabelled Data

This work studies Named Entity Recognition (NER) for Catalan without making use of annotated resources of this language. The approach presented is based on machine learning techniques and exploits Spanish resources, either by first training models for Spanish and then translating them into Catalan, or by directly training bilingual models. The resulting models are retrained on unlabelled Catalan data using bootstrapping techniques. Exhaustive experimentation has been conducted on real data, showing competitive results for the obtained NER systems.

[1]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[2]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[3]  James Mayfield,et al.  Entity Extraction without Language-Specific Resources , 2002, CoNLL.

[4]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[5]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[6]  Shuanhu Bai,et al.  Description of the Kent Ridge Digital Labs System Used for MUC-7 , 1998, MUC.

[7]  Ralph M. Weischedel,et al.  BEN: description of the PLUM system as used for MUC-6 , 1995, MUC.

[8]  Herbert Gish,et al.  BBN: Description of the PLUM System as Used for MUC-5 , 2005, MUC.

[9]  Erik F. Tjong Kim Sang,et al.  Memory-Based Named Entity Recognition , 2002, CoNLL.

[10]  William J. Black,et al.  Language Independent Named Entity Classification by modified Transformation-based Learning and by Decision Tree Induction , 2002, CoNLL.

[11]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[12]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[13]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[14]  Koji Tsukamoto,et al.  Learning with Multiple Stacking for Named Entity Recognition , 2002, CoNLL.

[15]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[16]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[17]  Rob Malouf,et al.  Markov Models for Language-independent Named Entity Recognition , 2002, CoNLL.

[18]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[19]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[20]  Douglas E. Appelt,et al.  SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[21]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .