Automatic acquisition of named entities for rule-based machine translation

This paper proposes to enrich RBMT dictionaries with Named Entities (NEs) automatically acquired from Wikipedia. The method is applied to the Apertium English‐Spanish system and its performance compared to that of Apertium with and without handtagged NEs. The system with automatic NEs outperforms the one without NEs, while results vary when compared to a system with handtagged NEs (results are comparable for Spanish!English but slightly worst for English!Spanish). Apart from that, adding automatic NEs contributes to decreasing the amount of unknown terms by more than 10%.

[1]  Francis M. Tyers,et al.  Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development , 2010, Prague Bull. Math. Linguistics.

[2]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[3]  Gideon S. Mann Fine-Grained Proper Noun Ontologies for Question Answering , 2002, COLING 2002.

[4]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[5]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[6]  Nicoletta Calzolari,et al.  SIMPLE: A General Framework for the Development of Multilingual Lexicons , 2000, LREC.

[7]  Yifan He,et al.  The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 , 2010, WMT@ACL.

[8]  Khalil Sima'an,et al.  Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) , 2008 .

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Christiane Fellbaum,et al.  Arabic WordNet. Current State and Future Extensions , 2008 .

[12]  Marisa Ulivieri,et al.  CLIPS, a Multi-level Italian Computational Lexicon: a Glimpse to Data , 2002, LREC.

[13]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[14]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[15]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[16]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[17]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[18]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[19]  Josef van Genabith,et al.  An Automatically Built Named Entity Lexicon for Arabic , 2010, LREC.

[20]  Antonio Toral,et al.  Named Entity WordNet , 2008, LREC.