Named Entity Recognition For Catalan Using Only Spanish Resources and Unlabelled Data

This work studies Named Entity Recognition (NER) for Catalan without making use of annotated resources of this language. The approach presented is based on machine learning techniques and exploits Spanish resources, either by first training models for Spanish and then translating them into Catalan, or by directly training bilingual models. The resulting models are retrained on unlabelled Catalan data using bootstrapping techniques. Exhaustive experimentation has been conducted on real data, showing competitive results for the obtained NER systems.

[1]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[2]  Koji Tsukamoto,et al.  Learning with Multiple Stacking for Named Entity Recognition , 2002, CoNLL.

[3]  William J. Black,et al.  Language Independent Named Entity Classification by modified Transformation-based Learning and by Decision Tree Induction , 2002, CoNLL.

[4]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[5]  Douglas E. Appelt,et al.  SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[6]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[7]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[8]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[9]  James Mayfield,et al.  Entity Extraction without Language-Specific Resources , 2002, CoNLL.

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[12]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[13]  Ralph M. Weischedel,et al.  BEN: description of the PLUM system as used for MUC-6 , 1995, MUC.

[14]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[15]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[16]  Herbert Gish,et al.  BBN: Description of the PLUM System as Used for MUC-5 , 2005, MUC.

[17]  Erik F. Tjong Kim Sang,et al.  Memory-Based Named Entity Recognition , 2002, CoNLL.

[18]  Shuanhu Bai,et al.  Description of the Kent Ridge Digital Labs System Used for MUC-7 , 1998, MUC.

[19]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[20]  Rob Malouf,et al.  Markov Models for Language-independent Named Entity Recognition , 2002, CoNLL.