Bilingual phrases for statistical machine translation

The statistical framework has proved to be very successful in machine translation. The main reason for this success is the existence of powerful techniques that allow to build machine translation systems automatically from available parallel corpora. Most of statistical machine translation approaches are based on single-word translation models, which do not take bilingual contextual information into account. The translation model in the phrase-based approach defines correspondences between sequences of contiguous source words (source segments) and sequences of contiguous target words (target segments) instead of only correspondences between single source words and single target words. That is, statistical phrase-based translation models make use of explicit bilingual contextual information. Different methods for the selection of adequate bilingual word sequences and for training the parameters of these odels are reviewed in this paper. Improved techniques for the selection and training model parameters are also introduced. The phrase-based approach has been assessed in different tasks using different corpora and the results obtained are comparable or better than the ones obtained using other statistical and non-statistical machine translation systems

[1]  Alexander H. Waibel,et al.  Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[2]  Patrik Lambert,et al.  Alignment of Parallel Corpora Exploiting Asymmetrically Aligned Phrases , 2006 .

[3]  Francisco Casacuberta,et al.  Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation , 2005, Machine Learning.

[4]  Francisco Casacuberta Inference of Finite-State Transducers by Using Regular Grammars and Morphisms , 2000, ICGI.

[5]  Francisco Casacuberta,et al.  Translation Memories Enrichment by Statistical Bilingual Segmentation , 2004, LREC.

[6]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[7]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[8]  Francisco Casacuberta,et al.  MONOTONE STATISTICAL TRANSLATION USING WORD GROUPS , 2001 .

[9]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[11]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[12]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[13]  Michel Simard,et al.  Bilingual Sentence Alignment: Balancing Robustness and Accuracy , 2004, Machine Translation.

[14]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[15]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.