Training Phrase-Based SMT without Explicit Word Alignment

The machine translation systems usually build an initial word-to-word alignment, before training the phrase translation pairs. This approach requires a lot of matching between different single words of both considered languages. In this paper, we propose a new approach for phrase-based machine translation which does not require any word alignment. This method is based on inter-lingual triggers retrieved by Multivariate Mutual Information. This algorithm segments sentences into phrases and finds their alignments simultaneously. The main objective of this work is to build directly valid alignments between source and target phrases. The achieved results, in terms of performance are satisfactory and the obtained translation table is smaller than the reference one; this approach could be considered as an alternative to the classical methods. Index Terms: Statistical Machine Translation, Inter-lingual triggers, Multivariate Mutual Information.

[1]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[2]  Norman Abramson,et al.  Information theory and coding , 1963 .

[3]  Alexander H. Waibel,et al.  Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[4]  Y. Zhang,et al.  Integrated phrase segmentation and alignment algorithm for statistical machine translation , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[5]  Kamel Smaïli,et al.  Using inter-lingual triggers for machine translation , 2007, INTERSPEECH.

[6]  Kamel Smaïli,et al.  Discovering phrases in machine translation by simulated annealing , 2008, INTERSPEECH.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Kamel Smaïli,et al.  A new method for learning Phrase Based Machine Translation with Multivariate Mutual Information , 2012 .

[9]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[12]  Chris Quirk,et al.  Machine Translation , 1972, HLT.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[15]  Hermann Ney,et al.  Word Triggers and the EM Algorithm , 1997, CoNLL.

[16]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.