Use of Rich Linguistic Information to Translate Prepositions and Grammar Cases to Basque

This paper presents three successful techniques to translate prepositions heading verbal complements by means of rich linguistic information, in the context of a rule-based Machine Translation system for an agglutinative language with scarce resources. This information comes in the form of lexicalized syntactic dependency triples, verb subcategorization and manually coded selection rules based on lexical, syntactic and semantic information. The first two resources have been automatically extracted from monolingual corpora. The results obtained using a new evaluation methodology show that all proposed techniques improve precision over the baselines, including a translation dictionary compiled from an aligned corpus, and a state-of-the-art statistical Machine Translation system. The results also show that linguistic information in all three techniques are complementary, and that a combination of them obtains the best F-score results overall.

[1]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[2]  Gorka Labaka,et al.  Transfer-Based MT from Spanish into Basque: Reusability, Standardization and Open Source , 2007, CICLing.

[3]  Gregory A. Sanders,et al.  Edit Distance: A Metric for Machine Translation Evaluation , 2006, LREC.

[4]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[5]  Andy Way,et al.  Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation , 2007, MTSUMMIT.

[6]  Ebba Gustavii,et al.  Target language preposition selection – an experiment with transformation based learning and aligned bilingual data , 2005, EAMT.

[7]  Janyce Wiebe,et al.  A System for Translating Locative Prepositions from English into French , 1991, ACL.

[8]  Philipp Koehn,et al.  Noun phrase translation , 2003 .

[9]  Andy Way,et al.  Example-Based Machine Translation of the Basque Language , 2006 .

[10]  Itziar Aduriz,et al.  A Cascaded Syntactic Analyser for Basque , 2004, CICLing.

[11]  Jon Oberlander,et al.  IN PROCEEDINGS OF EACL-2006 , 2006 .

[12]  Catherine Ball,et al.  The lexical choice of prepositions in machine translation , 2000 .

[13]  Dipti Misra Sharma,et al.  Simple Preposition Correspondence: A Problem in English to Indian Language Machine Translation , 2007, ACL 2007.

[14]  Arturo Trujillo Locations in the Machine Translation of Prepositional Phrases , 2005 .

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[17]  Yukiko Sasaki Alam,et al.  Decision Trees for Sense Disambiguation of Prepositions: Case of Over , 2004, HLT-NAACL 2004.

[18]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[19]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[20]  Eneko Agirre,et al.  Uso de la información morfológica en el alineamiento Español-Euskara , 2006, Proces. del Leng. Natural.

[21]  Sivaji Bandyopadhyay,et al.  Handling of Prepositions in English to Bengali Machine Translation , 2006, ACL 2006.

[22]  Aingeru Martínez Matxin. Erregeletan oinarritutako itzulpen automatikoko sistema baten eraikuntza estaldura handiko baliabide linguistikoak berrerabiliz (matxin. Construcción de un sistema de traducción automática basado en reglas reutilizando recursos lingüísticos de amplia cobertura) , 2007 .

[23]  Radhika Mamidi Disambiguating Prepositions for Machine Translation using Lexical Semantic Resources , 2004 .