Statistical Machine Translation for Automobile Marketing Texts

We describe a project on introducing an in-house statistical machine translation system for marketing texts from the automobile industry with the final aim of replacing manual translation with post-editing, based on the translation system. The focus of the paper is the suitability of such texts for SMT; we present experiments in domain adaptation and decompounding that improve the baseline translation systems, the results of which are evaluated using automatic metrics as well as manual evaluation.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Rico Sennrich,et al.  Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation , 2012, EACL.

[3]  Arianna Bisazza,et al.  FBK at WMT 2010: Word Lattices for Morphological Reduction and Chunk-Based Reordering , 2010, WMT@ACL.

[4]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[5]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[6]  Raymond Flournoy MT use within the enterprise: Encouraging adoption via a unified MT API , 2011, MTSUMMIT.

[7]  Sara Stymne,et al.  Compound Processing for Phrase-Based Statistical Machine Translation , 2009 .

[8]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[9]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[11]  Chris Dyer,et al.  Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[12]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[13]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[14]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[15]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[17]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[18]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[19]  Hermann Ney,et al.  Phrase Model Training for Statistical Machine Translation with Word Lattices of Preprocessing Alternatives , 2012, WMT@NAACL-HLT.

[20]  Ventsislav Zhechev Machine Translation Infrastructure and Post-editing Performance at Autodesk , 2012, AMTA.

[21]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[22]  Hua Wu,et al.  Alignment Model Adaptation for Domain-Specific Word Alignment , 2005, ACL.

[23]  Rico Sennrich,et al.  Machine Translation of TV Subtitles for Large Scale Production , 2010 .