Simultaneous Translation with Flexible Policy via Restricted Imitation Learning

Simultaneous translation is widely useful but remains one of the most difficult tasks in NLP. Previous work either uses fixed-latency policies, or train a complicated two-staged model using reinforcement learning. We propose a much simpler single model that adds a `delay' token to the target vocabulary, and design a restricted dynamic oracle to greatly simplify training. Experiments on Chinese English simultaneous translation show that our work leads to flexible policies that achieve better BLEU scores and lower latencies compared to both fixed and RL-learned policies.

[1]  Mingbo Ma,et al.  When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size) , 2017, EMNLP.

[2]  Samy Bengio,et al.  An Online Sequence-to-Sequence Model Using Partial Conditioning , 2015, NIPS.

[3]  Renjie Zheng,et al.  Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation , 2018, EMNLP.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Renjie Zheng,et al.  Learning to Stop in Structured Prediction for Neural Machine Translation , 2019, NAACL-HLT.

[6]  James Cross,et al.  Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.

[7]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[8]  Xing Li,et al.  STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency , 2018, ArXiv.

[9]  Alexander H. Waibel,et al.  Simultaneous translation of lectures and speeches , 2007, Machine Translation.

[10]  Andrej Ljolje,et al.  Segmentation Strategies for Streaming Speech Translation , 2013, HLT-NAACL.

[11]  Srinivas Bangalore,et al.  Real-time Incremental Speech-to-Speech Translation of Dialogs , 2012, NAACL.

[12]  Kyunghyun Cho,et al.  Can neural machine translation do simultaneous translation? , 2016, ArXiv.

[13]  Srinivas Bangalore,et al.  Incremental Segmentation and Decoding Strategies for Simultaneous Translation , 2013, IJCNLP.

[14]  Nadir Durrani,et al.  Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation , 2018, NAACL.

[15]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[16]  Bowen Zhou,et al.  Group Sparse CNNs for Question Classification with Answer Sets , 2017, ACL.

[17]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[18]  Noah A. Smith,et al.  You May Not Need Attention , 2018, ArXiv.

[19]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Jordan L. Boyd-Graber,et al.  Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation , 2014, EMNLP.