论文信息 - Improving Statistical Word Alignments with Morpho-syntactic Transformations - 字舞流文

Improving Statistical Word Alignments with Morpho-syntactic Transformations

This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.

José B. Mariño | Hermann Ney | Rafael E. Banchs | Adrià de Gispert | Marcello Federico | Maja Popovic | Deepa Gupta | Patrik Lambert | H. Ney | Marcello Federico | Maja Popovic | Patrik Lambert | A. Gispert | J. Mariño | Deepa Gupta

[1] Hermann Ney,et al. A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[2] Jonas Kuhn. Experiments in parallel-text based grammar induction , 2004, ACL.

[3] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[5] José B. Mariño,et al. TALP Phrase-based statistical translation system for European language pairs , 2006, WMT@HLT-NAACL.

[6] Xavier Carreras,et al. FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[7] Philip Resnik,et al. An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.

[8] José B. Mariño,et al. Guidelines for Word Alignment Evaluation and Manual Alignment , 2005, Lang. Resour. Evaluation.

[9] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10] Vasileios Hatzivassiloglou,et al. Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[11] Adrià de Gispert,et al. Phrase Linguistic Classification and Generalization for Improving Statistical Machine Translation , 2005, ACL.

[12] José B. Mariño,et al. Bilingual N-gram Statistical Machine Translation , 2005 .

[13] Hermann Ney,et al. Improving Word Alignment Quality using Morpho-syntactic Information , 2004, COLING.

[14] Jörg Tiedemann,et al. Combining Clues for Word Alignment , 2003, EACL.

[15] Hermann Ney,et al. POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[16] Christopher D. Manning,et al. Extentions to HMM-based Statistical Word Alignment Models , 2002, EMNLP.

[17] Hermann Ney,et al. Phrase-Based Statistical Machine Translation , 2002, KI.

[18] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[19] Kevin Knight,et al. A Syntax-based Statistical Translation Model , 2001, ACL.

[20] Gerhard Lakemeyer,et al. KI 2002: Advances in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[21] David Yarowsky,et al. Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.