GWLAN: General Word-Level AutocompletioN for Computer-Aided Translation

Computer-aided translation (CAT), the use of software to assist a human translator in the translation process, has been proven to be useful in enhancing the productivity of human translators. Autocompletion, which suggests translation results according to the text pieces provided by human translators, is a core function of CAT. There are two limitations in previous research in this line. First, most research works on this topic focus on sentence-level autocompletion (i.e., generating the whole translation as a sentence based on human input), but word-level autocompletion is under-explored so far. Second, almost no public benchmarks are available for the autocompletion task of CAT. This might be among the reasons why research progress in CAT is much slower compared to automatic MT. In this paper, we propose the task of general word-level autocompletion (GWLAN) from a real-world CAT scenario, and construct the first public benchmark1 to facilitate research in this topic. In addition, we propose an effective method for GWLAN and compare it with several strong baselines. Experiments demonstrate that our proposed method can give significantly more accurate predictions than the baseline methods on our benchmark datasets.

[1]  Shamil Chollampatt,et al.  Lexically Constrained Neural Machine Translation with Levenshtein Transformer , 2020, ACL.

[2]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[3]  George F. Foster,et al.  TransType: a Computer-Aided Translation Typing System , 2000 .

[4]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[5]  Yue Zhang,et al.  Code-Switching for Enhancing NMT with Pre-Specified Translation , 2019, NAACL.

[6]  Lemao Liu,et al.  Neural Machine Translation With Noisy Lexical Constraints , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[8]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[9]  Jeffrey Heer,et al.  The efficacy of human post-editing for language translation , 2013, CHI.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Lemao Liu,et al.  Balancing Quality and Human Involvement: An Effective Approach to Interactive Neural Machine Translation , 2020, AAAI.

[12]  Lemao Liu,et al.  TranSmart: A Practical Interactive Machine Translation System , 2021, ArXiv.

[13]  Monojit Choudhury,et al.  INMT: Interactive Neural Machine Translation Prediction , 2019, EMNLP.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Francisco Casacuberta,et al.  Interactive neural machine translation , 2017, Comput. Speech Lang..

[16]  Yu Zhou,et al.  A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly , 2015, IJCAI.

[17]  Germán Sanchis-Trilles,et al.  CASMACAT: A Computer-assisted Translation Workbench , 2014, EACL.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[20]  Tomoyuki Kajiwara,et al.  Negative Lexically Constrained Decoding for Paraphrase Generation , 2019, ACL.

[21]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[22]  Shujian Huang,et al.  PRIMT: A Pick-Revise Framework for Interactive Machine Translation , 2016, NAACL.

[23]  Lei Li,et al.  Correct-and-Memorize: Learning to Translate from Interactive Revisions , 2019 .

[24]  John DeNero,et al.  Models and Inference for Prefix-Constrained Machine Translation , 2016, ACL.

[25]  Hai Zhao,et al.  Moon IME: Neural-based Chinese Pinyin Aided Input Method with Customizable Association , 2018, ACL.

[26]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[27]  Muriel Vasconcellos,et al.  SPANAM and ENGSPAN: Machine Translation at the Pan American Health Organization , 1985, Comput. Linguistics.

[28]  Philipp Koehn,et al.  Neural Interactive Translation Prediction , 2016, AMTA.

[29]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[30]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[31]  Huda Khayrallah,et al.  Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting , 2019, NAACL.

[32]  Yaser Al-Onaizan,et al.  Training Neural Machine Translation to Apply Terminology Constraints , 2019, ACL.

[33]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[34]  Jeffrey Heer,et al.  Human Effort and Machine Learnability in Computer Aided Translation , 2014, EMNLP.

[35]  Lemao Liu,et al.  Touch Editing: A Flexible One-Time Interaction Approach for Translation , 2020, AACL.

[36]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.