Investigations on machine translation system combination

Machine translation is a task in the field of natural language processing whose objective is to translate documents from one human language into another human language without any human interaction. There has been extensive research in the field of machine translation and many different machine translation approaches have emerged. Current machine translation systems are based on different paradigms, such as e.g. phrases, phrases with gaps, hand-written rules, syntactical rules or neural networks. All approaches have been proven to perform well on several international evaluation campaigns, but no one has emerged as the superior approach. In this thesis, we investigate the combination of different machine translation approaches to benefit from all of them. The combination of outputs from multiple machine translation systems has been successfully applied in state-of-the-art machine translation evaluations for several years. System combination is a reliable method to combine the benefits of different machine translation systems into one single translation output. System combination relies on the concept of majority voting and the assumption that different machine translation engines produce different errors at different positions, but the majority agrees on a correct translation. Confusion network decoding has emerged as one of the the most successful approaches in combining machine translation outputs. The main goal of this thesis is to develop novel methods to improve the translation quality of confusion network system combination. In this thesis, we introduce a novel system combination implementation which has been made available as open-source toolkit to the research community. We extend previous invented approaches by the addition of several models and show that our methods produce better or similar translation results as the previous invented approaches. Moreover, compared to one single system combination approach, our implementation is significantly better in several translation tasks. On top of this high-level baseline, we extend the confusion network approach with an additional model learned by a neural network. The system combination output is typically a combination of the best available system engines and ignores the output of weaker translation systems, although they could be helpful in some situations. We show that our novel model also takes weaker systems into account and detects the positions where the weaker systems help to improve the quality of the combined translation. One of the most important steps in system combination is the pairwise alignment process between the different input systems. We introduce a novel alignment algorithm which is based on the source sentence and improves the translation quality of our combined translation. In addition to automatic evaluations, we also let humans evaluate our novel approach. Furthermore, we investigate the effect of decoding direction in the commonly used phrase-based and hierarchical phrase-based machine translation approaches. We show how to benefit from system combination and combine different machine translation setups that are based on different decoding directions. In addition, we investigate techniques to combine the different configurations in an earlier stage, e.g. after the alignment training or the phrase extraction step. Finally, we present our recent evaluation results that were obtained with our previously invented methods. We participated in the most recent international evaluation campaigns and demonstrate that our methods outperform the translation setups of all participating top-ranked international research labs in several language pairs.

[1]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  H. Ney,et al.  Better punctuation prediction with hierarchical phrase-based translation , 2014, IWSLT.

[3]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[4]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[5]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[6]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[7]  Tadashi Nomoto Multi-Engine Machine Translation with Voted Language Model , 2004, ACL.

[8]  Jianfeng Gao,et al.  Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems , 2008, EMNLP.

[9]  Markus Freitag,et al.  A Performance Study of Cube Pruning for Large-Scale Hierarchical Machine Translation , 2013, SSST@NAACL-HLT.

[10]  Hichem Sahbi,et al.  Consensus Network Decoding for Statistical Machine Translation System Combination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  Markus Freitag,et al.  Hierarchical Phrase-Based Translation with Jane 2 , 2012, Prague Bull. Math. Linguistics.

[13]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[14]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[15]  Hermann Ney,et al.  Complexity of Finding the BLEU-optimal Hypothesis in a Confusion Network , 2008, EMNLP.

[16]  Yehoshua Bar-Hillel,et al.  The present state of research on mechanical translation , 1951, EARLYMT.

[17]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[18]  Philipp Koehn,et al.  Preference Grammars and Soft Syntactic Constraints for GHKM Syntax-based Statistical Machine Translation , 2014, SSST@EMNLP.

[19]  Franz Josef Och,et al.  An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[20]  Hermann Ney,et al.  Improvements in dynamic programming beam search for phrase-based statistical machine translation. , 2008, IWSLT.

[21]  Alicia Fornés,et al.  Bidirectional Language Model for Handwriting Recognition , 2012, SSPR/SPR.

[22]  Markus Freitag,et al.  Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation , 2012, COLING.

[23]  Alex Waibel,et al.  EU-BRIDGE MT: text translation of talks in the EU-BRIDGE project , 2013, IWSLT.

[24]  Markus Freitag,et al.  The RWTH Aachen speech recognition and machine translation system for IWSLT 2012 , 2012, IWSLT.

[25]  Alex Waibel,et al.  Combined spoken language translation , 2014, IWSLT.

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[28]  Hermann Ney,et al.  Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation , 2012, ACL.

[29]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[30]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[31]  Kristina Toutanova,et al.  Joint Optimization for Machine Translation System Combination , 2009, EMNLP.

[32]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[33]  Hermann Ney,et al.  Improving Statistical Machine Translation with Word Class Models , 2013, EMNLP.

[34]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[35]  Hermann Ney,et al.  Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment , 2006, EACL.

[36]  Loïc Barrault,et al.  Many , 2020, Definitions.

[37]  Sergei Nirenburg,et al.  Three Heads are Better than One , 1994, ANLP.

[38]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[39]  S. Vogel,et al.  SMT decoder dissected: word reordering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[40]  Eiichiro Sumita,et al.  Bidirectional Phrase-based Statistical Machine Translation , 2009, EMNLP.

[41]  Yang Feng,et al.  Lattice-based System Combination for Statistical Machine Translation , 2009, EMNLP.

[42]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[43]  Haizhou Li,et al.  Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers , 2011, ACL.

[44]  Markus Freitag,et al.  Review of Hypothesis Alignment Algorithms for MT System Combination via Confusion Network Decoding , 2012, WMT@NAACL-HLT.

[45]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[46]  Markus Freitag,et al.  Local System Voting Feature for Machine Translation System Combination , 2015, WMT@EMNLP.

[47]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[48]  Markus Freitag,et al.  The RWTH Aachen System for NTCIR-10 PatentMT , 2013, NTCIR.

[49]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[50]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[51]  Markus Freitag,et al.  Joint WMT Submission of the QUAERO Project , 2011, WMT@EMNLP.

[52]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[53]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[54]  Taro Watanabe,et al.  Reordering Constraints for Phrase-Based Statistical Machine Translation , 2004, COLING.

[55]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[56]  Sanjeev Khudanpur,et al.  Machine Translation System Combination using ITG-based Alignments , 2008, ACL.

[57]  Markus Freitag,et al.  The RWTH Aachen German-English Machine Translation System for WMT 2014 , 2014 .

[58]  Patrick Wambacq,et al.  Confidence scoring based on backward language models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[60]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[61]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[62]  Min Zhang,et al.  Backward and trigger-based language models for statistical machine translation , 2015, Nat. Lang. Eng..

[63]  Alon Lavie,et al.  Combining Machine Translation Output with Open Source: The Carnegie Mellon Multi-Engine Machine Translation Scheme , 2010, Prague Bull. Math. Linguistics.

[64]  Marcin Junczys-Dowmunt 16th Annual Conference of the European Association for Machine Translation (EAMT) , 2012 .

[65]  George F. Foster,et al.  Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables , 2011, MTSUMMIT.

[66]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[67]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[68]  Loïc Barrault,et al.  Open Source Machine Translation System Combination , 2010 .

[69]  Philipp Koehn,et al.  Augmenting String-to-Tree and Tree-to-String Translation with Non-Syntactic Phrases , 2014, WMT@ACL.

[70]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[71]  Markus Freitag,et al.  The RWTH Aachen Machine Translation System for WMT 2012 , 2013, WMT@ACL.

[72]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[73]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[74]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[75]  Jin Yang,et al.  The System Combination RWTH Aachen: SYSTRAN for the NTCIR-10 PatentMT Evaluation , 2013, NTCIR.

[76]  Markus Freitag,et al.  A Guide to Jane, an Open Source Hierarchical Translation Toolkit , 2011, Prague Bull. Math. Linguistics.

[77]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[78]  Jan Niehues,et al.  An MT Error-Driven Discriminative Word Lexicon using Sentence Structure Features , 2013, WMT@ACL.

[79]  Markus Freitag,et al.  Modeling punctuation prediction as machine translation , 2011, IWSLT.

[80]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[81]  Taro Watanabe,et al.  Bidirectional Decoding for Statistical Machine Translation , 2002, COLING.

[82]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.