ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool

This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a ‘Controlled Language’ that is ‘understood’ by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author’s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules.

[1]  T. Mitamura Controlled language for multilingual machine translation , 1999, MTSUMMIT.

[2]  Andy Way,et al.  Example-based controlled translation , 2004, EAMT.

[3]  Teruko Mitamura Controlled Language for Multilingual Machine Translation 1 , 1999 .

[4]  Catherine Dolbear,et al.  Rabbit: Developing a Control Natural Language for Authoring Ontologies , 2008, ESWC.

[5]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[6]  Irina Temnikova TEXT COMPLEXITY AND TEXT SIMPLIFICATION , 2012 .

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Teruko Mitamura,et al.  14. Controlled language for authoring and translation , 2003 .

[9]  Andy Way,et al.  Facilitating Translation Using Source Language Paraphrase Lattices , 2010, EMNLP.

[10]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Shachar Mirkin,et al.  An SMT-driven Authoring Tool , 2012, COLING.

[13]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[14]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[15]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[16]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17]  Pius ten Hacken Computers and translation: a translator's guide , 2004 .

[18]  Andy Way,et al.  Controlled generation in example-based machine translation , 2003 .

[19]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[20]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[21]  Qun Liu,et al.  Bilingually-Constrained (Monolingual) Shift-Reduce Parsing , 2009, EMNLP.

[22]  Richard Power,et al.  Composing Questions through Conceptual Authoring , 2007, CL.

[23]  Lucia Specia,et al.  Sub-sentence Level Analysis of Machine Translation Post-editing Effort , 2014 .

[24]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[25]  Qun Liu,et al.  A discriminative framework of integrating translation memory features into SMT , 2014, AMTA.

[26]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Shachar Mirkin,et al.  SORT: An Interactive Source-Rewriting Tool for Improved Translation , 2013, ACL.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Richard Power,et al.  Multilingual generation of controlled languages , 2003, EAMT.

[30]  Sharon O’Brien Controlling controlled English , 2003, EAMT.