Combined spoken language translation

EU-BRIDGE 1 is a European research project which is aimed at developing innovative speech translation technology. One of the collaborative efforts within EU-BRIDGE is to produce joint submissions of up to four different partners to the evaluation campaign at the 2014 International Workshop on Spoken Language Translation (IWSLT). We submitted combined translations to the German!English spoken language translation (SLT) track as well as to the German!English, English!German and English!French machine translation (MT) tracks. In this paper, we present the techniques which were applied by the different individual translation systems of RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show the combination approach developed at RWTH Aachen University which combined the individual systems. The consensus translations yield empirical gains of up to 2.3 points in BLEU and 1.2 points in TER compared to the best individual system.

[1]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Franz Josef Och,et al.  An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[10]  S. Vogel,et al.  SMT decoder dissected: word reordering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  Helmut Schmid Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[13]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[14]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[15]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[16]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[17]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[18]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[19]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[20]  Hermann Ney,et al.  Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment , 2006, EACL.

[21]  S. Vogel,et al.  Word reordering in statistical machine translation with a POS-based distortion model , 2007, TMI.

[22]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[23]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[24]  Christopher D. Manning,et al.  Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines , 2008 .

[25]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[26]  Jan Niehues,et al.  The Universität Karlsruhe Translation System for the EACL-WMT 2009 , 2009, WMT@EACL.

[27]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[28]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[29]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[30]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[31]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[32]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[33]  Jan Niehues,et al.  The KIT English-French translation systems for IWSLT 2011 , 2011, IWSLT.

[34]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[35]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[36]  Markus Freitag,et al.  Modeling punctuation prediction as machine translation , 2011, IWSLT.

[37]  Arianna Bisazza,et al.  Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[38]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[39]  Marcello Federico Language Modelling , 2012 .

[40]  Markus Freitag,et al.  Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation , 2012, COLING.

[41]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[42]  Philipp Koehn,et al.  Sparse lexicalised features and topic adaptation for SMT , 2012, IWSLT.

[43]  Philipp Koehn Interpolated Backoff for Factored Translation Models , 2012, AMTA.

[44]  A. Waibel,et al.  Detailed Analysis of Different Strategies for Phrase Table Adaptation in SMT , 2012, AMTA.

[45]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[46]  Jan Niehues,et al.  Segmentation and punctuation prediction in speech language translation using a monolingual translation system , 2012, IWSLT.

[47]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[48]  Alex Waibel,et al.  EU-BRIDGE MT: text translation of talks in the EU-BRIDGE project , 2013, IWSLT.

[49]  Jan Niehues,et al.  Combining Word Reordering Methods on different Linguistic Abstraction Levels for Statistical Machine Translation , 2013, SSST@NAACL-HLT.

[50]  Jan Niehues,et al.  An MT Error-Driven Discriminative Word Lexicon using Sentence Structure Features , 2013, WMT@ACL.

[51]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[52]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[53]  Anthony Rousseau,et al.  XenC: An Open-Source Tool for Data Selection in Natural Language Processing , 2013, Prague Bull. Math. Linguistics.

[54]  Hermann Ney,et al.  Improving Statistical Machine Translation with Word Class Models , 2013, EMNLP.

[55]  Nadir Durrani,et al.  Edinburgh SLT and MT System Description for the IWSLT 2014 Evaluation , 2013 .

[56]  Philipp Koehn,et al.  Edinburgh's Syntax-Based Systems at WMT 2014 , 2014, WMT@ACL.

[57]  Nadir Durrani,et al.  EU-BRIDGE MT: Combined Machine Translation , 2014, WMT@ACL.

[58]  Jianfeng Gao,et al.  Large-scale Expected BLEU Training of Phrase-based Reordering Models , 2014, EMNLP.

[59]  Philipp Koehn,et al.  Preference Grammars and Soft Syntactic Constraints for GHKM Syntax-based Statistical Machine Translation , 2014, SSST@EMNLP.

[60]  Markus Freitag,et al.  Jane: Open Source Machine Translation System Combination , 2014, EACL.

[61]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[62]  Philipp Koehn,et al.  Augmenting String-to-Tree and Tree-to-String Translation with Non-Syntactic Phrases , 2014, WMT@ACL.

[63]  H. Ney,et al.  Better punctuation prediction with hierarchical phrase-based translation , 2014, IWSLT.

[64]  Nadir Durrani,et al.  Investigating the Usefulness of Generalized Word Representations in SMT , 2014, COLING.